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1. Introduction 

Edge detection forms the first stage in a very large number of vision modules, 
and any edge detector should be formulated in the appropriate context. However, 
the requirements of many modules are similar and it seems as though it should be 
possible to design one edge detector that performs well in several contexts. The 
crucial first step in the design of such a detector should be the specification of a 
set of performance criteria that capture these requirements. The specification of 
these criteria and the derivation of optimal operators from them forms the subject 
of this report. 

The operation of the edge detector is best illustrated by the example in figure 
(1.1), which was produced by the detector described in this report. The detector 
accepts discrete digitized images and produces an "edge map" as its output. The 
edge map includes explicit information about the position and strength of edges, 
their orientation, and the "scale" at which the change took place. Although they 
are not made explicit, it is also possible to compute the uncertainty in position or 
strength of an edge from the quantites in the edge map. The example in figure (1.1) 
includes position information only. 

A digitized image contains a great deal of redundancy. There is redundancy in 
the information theoretic sense (it is possible to compress the sampled data into fewer 
bits without changing the reconstructed image significanty). Even after efficient 
encoding, much of the what remains is not useful to later vision modules. These 
modules typically require structural information, i.e. details of surface orientation 
and the material of which the visible surfaces comprise. Where the surfaces are 
smooth and of uniform reflectance, shape from shading (Horn 1975) may be applied 
to obtain surface orientation. In many other modules such as shape from motion 
(Ullman 1979 and Hildreth 1983), shape from contour (Stevens 1980), shape from 
texture (Witkin 1980), and Stereo (Marr and Poggio 1979, Grimson 1981) structural 
properties of underlying surfaces are inferred from edge contours. In particular, 
step changes in intensity are important because they typically correspond to sharp 
changes in orientation or material, or to object boundaries. Edge detection is a 
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Figure 1.1. Positional information provided by the edge detector applied to an 
image of some mechanical parts 



means of generating compact descriptions which preserve most of the structural 
information in an image. 

Some previous formulations have chosen the first or second derivative as the 
appropriate quantity to characterize step edges, and have formed optimal estimates 
of this derivative over some support. Examples of first derivative operators are the 
operators of Roberts (1965) and Macleod (1970), while Modestino and Fries (1977) 
formed an optimal estimate of the two-dimensional Laplacian over a large support. 
Marr and Hildreth (1980) suggested the Laplacian of a broad Gaussian since it 
optimizes the trade-off in localization and bandwidth. There are problems with the 
Laplacian however, and the whole concept of derivative estimation seems to have 
poor foundation. These criticisms will be made specific in chapter 7. 

There is a second major class of formulations in which the image surface is 
approximated by a set of basis functions and the edge parameters are estimated 
from the modelled image surface. Examples of this technique include the work of 
Prewitt (1970), Hueckel (1971) and Haralick (1982). These methods allow more 
direct estimates of edge properties such as position and orientation, but since the 
basis functions are usually not complete, the properties apply only to a projection 
of the actual image surface on to the subspace spanned by the basis functions. 
However, the basis functions are a major factor in operator performance, especially 
its ability to localize edges. 

In this report we begin with a traditional model of a step edge in white Gaussian 
noise and try to formulate precisely the criteria for effective edge detection. We 
assume that detection is performed by convolving the noisy edge with a spatial 
function f(x) (which we are trying to find) and by marking edges at the maxima in 
the output of this convolution. We then specify three performance criteria on the 
output of this operator. 

(i) Good detection. There should be a low probablity of failing to mark real edge 
points, and low probability of falsely marking non-edge points. Since both 
these probabilities are monotonically decreasing functions of the output signal 
to noise ratio, this criterion corresponds to maximizing signal to noise ratio. 



(ii) Good localization. The points marked as edges by the operator should be as 
close as possible to the centre of the true edge. 

(iii) Only one response to a single edge. This is implicitly captured in (i) since 
when two nearby operators respond to the same edge, one of them must be 
considered a false edge. However, the mathematical form of the first criterion 
did not capture the multiple response requirement and it had to be made 
explicit. 

The first result of the analysis for step edges is that (i) and (ii) are conflicting 
and that there is a trade-off or uncertainty principle between them. Broad operators 
have good signal to noise ratio but poor localization and vice-versa. A simple 
choice of the mathematical form for the localization criterion gives a product of 
a localization term and signal to noise ratio that is constant. Spatial scaling of 
the function f(x) will change the individual values of signal to noise ratio and 
localization but not their product. Given the analytic form of a detection function, 
we can theoretically obtain arbitrarily good signal to noise ratio or localization from 
it by scaling, but not simultaneously. From the analysis we can conclude that there 
is a single best shape for the function / which maximizes the product and that if we 
scale it to achieve some value of one of the criteria, it will simultaneously provide 
the maximum value for the other. To handle a wide variety of images, an edge 
detector needs to use several different widths of operator, and to combine them in 
a coherent way. By forming the criteria for edge detection as a set of functional 
of the unknown operator /, we can use variational techniques to find the function 
that maximizes the criteria. 

The second result is that the criteria (i) and (ii) by themselves are inadequate 
to produce a useful edge detector. It seems that we can obtain maximal signal to 
noise ratio and arbitrarily good localization by using a difference of boxes operator. 
The difference of boxes (see figure 2.2) was suggested by Rosenfeld and Thurston 
(1971) and was used by Herskovits and Binford (1970). If we look closely at the 
response of such an operator to a noisy step edge we find that there is an output 
maximum close to the centre of the edge, but that there may be many others 
nearby. We have not achieved good localization because there is no way of telling 



which of the maxima is closest to the true edge. The addition of criterion (iii) 
gives an operator that has very low probability of giving more than one maximum 
in response to a single edge, and it also leads to a finite limit for the product of 
localization and signal to noise ratio. 

The third result is an analytic form for the operator. It is the sum of four 
complex exponentials and can be approximated by the first derivative of a Gaussian. 
A numerical finite dimensional approximation to this function was first found using a 
stochastic hill-climbing technique. This was done because it was much easier to write 
the multiple response criterion in deterministic form for a numerical optimization 
than as a functional of /. Specifically, the numerical optimizer provides candidate 
outputs for evaluation, and it is a simple matter to count the number of maxima 
in one of the outputs. To express this constraint analytically we need to find the 
expectation value of the number of maxima in the response to an edge, and to 
express this as a functional on /, which is much more difficult. The first derivative 
of a Gaussian has been suggested before (Macleod 1970). It is also worth noting 
that in one dimension the maxima in the output of this first derivative operator 
correspond to zero-crossings in the output of a second derivative operator. 

Several further results relate to the extension of the operator to two (or more) 
dimensions. They can be summarized roughly by saying that the detector should 
be directional, and if the image permits, the more directional the better. The issue 
of non-directional (Laplacian) versus directional edge operators has been the topic 
of debate for some time, compare for example Marr (1976) with Marr and Hildreth 
(1980). To summarize the argument presented here, a directional operator can be 
shown to have better localization than the Laplacian, signal to noise ratio is better, 
the computational effort required to compute the directional components is slight 
if efficient algorithms are used, and finally the problem of combining operators 
of several orientations is difficult but not intractable. It is, for example, much 
more difficult to combine the outputs of operators of different sizes, since their 
supports differ markedly. For a given operator width, both signal to noise ratio and 
localization improve as the length of the operator (parallel to the edge) increases, 
provided of course that the edge does not deviate from a straight line. When 



the image does contain long approximately straight contours, highly directional 
operators are the best choice. This means several operators will be necessary to 
cover all possible edge orientations, and also that less directional operators will also 
be needed to deal with edges that are locally not straight. 

The problem of combining the different operator widths and orientations is 
approached in an analogous manner to the operator derivation. We begin with 
the same set of criteria and try to choose the operator that gives good signal to 
noise ratio and best localization. We set a minimum acceptable error rate and 
then choose the smallest operator with greater signal to noise than the threshold 
determined by the error rate. In this way the global error rate is fixed while the 
localization of a particular edge will depend on the local image signal to noise ratio. 
The problem of choosing the best operator from a set of directional operators is 
simpler, since only one or two will respond to an edge of a particular orientation. 
The problem of choosing between a long directional operator and a less directional 
one is theoretically simple but difficult in practice. Highly directional operators are 
clearly preferable, but they cannot be used for locally curved edges. It is necessary 
to associate a goodness of fit measure with each operator that indicates how well 
the image fits the model of a linearly extended step. When the edge is good enough 
the directional operator output is used and the output of less directional neighbours 
is suppressed. 

While the detection of step edges is the primary goal of the report, chapter 4 
gives a general form for the optimality criteria. Using this general form, it is possible 
to design optimal operators for arbitrary features. A numerical optimization is 
used to find the impulse response of the operator given an input waveform to be 
detected. The technique is illustrated by the derivation of operators for ridge, roof 
and step edges. Of these the ridge and step detectors have been tested on real 
images. The particular problems of extending the one-dimensional ridge operator 
to two dimensions, and the problem of integrating the step and ridge detector 
outputs are discussed. 

Following the analysis we outline some simple experiments which seem to 
indicate that the human visual system is performing similar selections (at some 
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computational level), or at least that the computation that it does perform has 
a similar set of goals. We find that adding noise to an image lias the effect of 
producing a blurring of the image detail, which is consistent with there being several 
operator sizes. More interestingly, the addition of noise may enable perception of 
changes at a large scale which, even though they were present in the original image, 
were difficult to perceive because of the presence of sharp edges. Our ability to 
perceive small fluctuations in edges that are approximately straight is also reduced 
by the addition of noise, but the impression of a straight edge is not. 

As a guide to the reader, chapters 2 and 3 form the core of the analysis for 
step edges. They also contain most of the signal theory, and the general reader 
may wish to skim over them. The first section of chapter 3 should be read however, 
as it includes the translation of the theoretical operator into a practical algorithm. 
Chapter 4 is easier going and contains a more general form for the optimality 
criteria. It gives examples of the solution of the variational problem for roof and 
ridge edges. Chapter 5 is titled "details of implementation" and it may be tempting 
to avoid it as being too low-level. However it contains several efficient algorithms 
for Gaussian convolution, and may have applications outside the scope of the 
present work. Finally, chapters 6 and 7 give weight to the analysis by showing the 
performance of the operator on real images and by comparing it both experimentally 
and theoretically with some other edge detectors. 
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2. One-Dimensional Formulation for Step Edges 

The basic design problem is illustrated in figure (2.1). We are trying to detect 
a step edge which is bathed in Gaussian noise, figure (2.1a). We convolve with some 
spatial function (2.1b) and mark edges at maxima in the result of this convolution 
(2.1c). The objective is to find the spatial function (call it /) which gives the "best" 
output, where best is defined by a precise set of criteria on step edge detection. 

Some preliminaries on notation ; when we speak of an edge detection "operator" 
we mean a mapping from a one or two dimensional intensity function (the image 
or a linear slice through it) to an intensity function of the same dimension. If the 
operator is linear and shiR invariant, then it can be represented by a convolution of 
the intensity function with the "impulse response" (one dimension) or "point-spread 
function" (two dimensions) of the operator, which is the result of applying the 
operator to a unit impulse at the origin. Shift invariance is clearly a desirable 
property of an edge operator. To begin with we will consider only linear shift 
invariant operators and later we will apply decision procedures to their outputs, 
which will lead to shift invariant non-linear operators. The operator that describes 
the mapping from an image to the final representation of edge contours is called 
the "edge detector" . 

The key to the design of an effective edge operator is the accurate evaluation of 
its performance. If we can write down the evaluation function in closed mathematical 
form, we can apply standard tools such as the calculus of variations to find the 
operator that maximizes it. As with many optimization problems, the key to 
obtaining a useful answer is to ask the right questions. The edge detection problem 
is no exception, as should become apparent in the course of the derivation. Several 
passes at the evaluation function had to be made before one was found that closed 
all the "loopholes" and excluded operators that were impractical for "obvious" 
reasons. This is not to say that the problem became one of finding a question to 
fit a proposed solution, but rather that the question was always the same, it was 
just very difficult to express in a closed form that was simple enough to yield 
a variational problem that could be solved. By way of contrast, it was relatively 
easy to obtain a similar solution using a Monte Carlo optimization, because the 
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Figure 2.1. (a) The step edge model, (b) The detection function to be derived, 
(c) The result of trie convolution of this function with the edge. 



evaluation could be done directly on the output of a candidate operator. The real 
problem then, was the translation of the intuitive performance goals to functionals 
that depended directly on the form of the operator. This section describes the main 
stages in the translation process. 
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2.1. An Uncertainty Principle 

We consider first the one dimensional edge detection problem. The goal is to 
detect and mark step changes in a signal that contains additive white noise. We 
assume that the signal is flat on both sides of the discontinuity, and that there are 
no other edges close enough to affect the output of the operator (see figure 2.1). 
We need to somehow combine the two goals of accurate detection and localization 
into a single evaluation functional. The detection criterion is simple to express 
in terms of the signal to noise ratio in the operator output, i.e. the ratio of the 
output in response to the step input to the output in response to the noise only. 
The localization criterion is more difficult, but a reasonable choice is the inverse of 
the distance between the true edge and the edge marked by the detector. For the 
distance measure we will use the standard deviation in the position of the maximum 
of the operator output. By using local maxima we are making what seems to be 
an arbitrary choice in the mapping from linear operator output to detector output. 
But the mapping must involve some local predicate, and since we are designing a 
linear operator that will respond strongly to step edges, the maxima in its response 
are a logical choice. 

Let the amplitude of the step be A, and let the noise be n(x). Then the input 
signal I(x) can be represented as 

I(x) = Au-^x) + n{x) (2.1) 

where u—i(x) is the unit step function defined as 

f 0, for x < 
U, for x > 

Let the impulse response of the operator we are seeking be represented by 
the function f(x). Then the output O(x ) of the application of the operator to the 
input I(x) is given by the convolution integral, 



O{x ) = /_ I{x)f{x - x) dx (2.2) 
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We can use the linearity of convolution to split this integral into contributions due 
to the step and to noise only. The output due to the step only is (at the centre of 
the step, i.e. at xq = 0) 



f+oo /-O 

]_ /(i)Am-i(-x) dx = Aj_^f{x) 



(2.3) 



While the mean squared. response to the noise component only will be 



E 



• + 0O 



r-f-oo 

/ /(x)n( — x) dx 

J — oo 



where E[y] is the expectation value of y. If the noise is white the above 
simplifies to 



E 



/+oo "I r+oo 

f\x)n\-x) dx =n\ f\x) 

— 00 » — 00 



dx 



where n§ = l?[n 2 (x)] for all x, i.e. rig is the variance of the input noise. We define 
the output signal-to-noise ratio as the quotient of the response to the step only and 
the square root of the mean squared noise response. 



S.N.R. 



n yjl^f 2 (x)dx 



From this expression we can define a measure S of the signal to noise 
performance of the operator which is independent of the input signal 



S.N.R. 



and 



E = 



no 



}!„ /(«) dx 



(2.5) 



This then is the first part of our dual criterion, and finding the impulse response 
/ which maximizes it corresponds to finding the best operator for detection only. 

For the localization criterion we proceed as follows. Recall that we chose to 
mark edges at maxima in the output of the operator. For an ideal step we would 
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expect a single maximum at the centre of the edge. Since the signal I(x) contains 
noise we would expect this maximum to be displaced from the true position of the 
edge (at the origin in this case). To obtain a performance measure which improves 
as the localizing ability of the operator improves, we use the reciprocal of the 
standard deviation of the distance of the actual maximum from the centre of the 
true edge. This is not an arbitrary choice, as it gives a composite performance 
criterion which is scale independent, as we shall see. 

A maximum in the output 0(xq) of the operator corresponds to a zero-crossing 
in the spatial derivative of this output. We wish to find the position xq where 



O'(x ) = 4~ \ SW\&> ~ A dx = 

CLXq j — oo 

Which by the differentiation theorem for convolution can be simplified to 



/ f(x) I(x — x)dx = 



To find xo we again split the derivative of the output O'(xo) into components 
due to the step and due to noise only (call these 0' s and 0' n respectively). 



^ /'(x)Au_i(x - x) dx = y_ Qo Af'{x) dx = Af{x ) (2.6) 

The response of the derivative filter to the noise only (at any output point) 
will be a Gaussian random variable with mean zero and variance equal to the 
mean- squared output amplitude 

E[0'„ 2 (x)] = n 2 /_ + J° f'\x) dx (2.7) 

We now add the constraint that the function / should be antisymmetric. 
An arbitrary function can always be split into symmetric and antisymmetric 
components, but it should be clear that the symmetric component adds nothing to 
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the detection or localizing ability of the operator but will contribute to the noise 
components that affect both. The Taylor expansion of O' s {x ) about the origin gives 

O' s (x ) = Af(x Q ) « * A/'(0) ( 2 - 8 ) 

For a zero- crossing in the output O' we require 

O'(x ) = O' s (x ) + 0' n (x Q ) = (2.9) 

i.e. O' s {x ) = -O' n (x ) and E[O'?{x )} = £[OS(*o)]. Substituting for the two 
outputs from (2.7) and (2.8) we obtain 

where 6i is an approximation to the standard deviation of the distance of the 
actual maximum from the true edge. The localization is defined as the reciprocal 
of 6xq 



Localization = 



n ° >//+" /»»(*) dx 
Again we define a performance measure A which is a property of the operator only 

Location - A A A _ /Wl (2 . u) 

Having obtained both our desired criteria, we now have the problem of 
combining them in a meaningful way. It turns out that if we use the product of the 
two criteria we obtain a measure which is both amplitude and scale independent. 
This measure is a property of the shape of the impulse response / only, and will be 
the same for all functions f w obtained from / by spatial scaling. In fact the choice 
of the combination will not affect the form of the solution since the variational 
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equations depend only on the individual terms in the criteria. The product of the 
two criteria is 



E(/)A(/) = J-o°f( x ) dx 



l/'(0)| 



y/f±Z P(*) dx y/j^f' 2 (x) dx 



(2.12) 



To illustrate the invariance of this criterion under changes of scale, we consider 
the performance of an operator whose impulse response is f w where f w (x) = /(£). 
The performance of the scaled operator is 



X(fw)HM = 



v /^7 f-oof(*)dx 



V/+» Pi*) dx 



l/'(0)| 



.^\IS±£f' 2 {x)dx 



(2.13) 



where the bracketed terms correspond in order to the detection and localization 
criteria. We see from this form that the signal to noise performance of the operator 
varies as >Jw, while the localization varies as the reciprocal of y/w. An operator with 
a broad impulse response will have good signal to noise ratio but poor localization 
and vice versa. With this form of the composite criterion though, the product of 
detection and localization terms is the same for all f w . 

This result suggests that there is a class of operators that have optimal 
performance and that they are related by spatial scaling. In fact this result is 
independent of the choice of combination of the criteria. To see this we assume 
that there is a function / which gives the best localization A for a particular E. 
That is, we find / such that 



£(/) = c i and A(/) is maximized 



(2.14) 



Now suppose we seek a second function f w which gives the best possible 
localization while its signal to noise ratio is fixed to a different value, i.e. 



£(/«/) = c 2 while A(f w ) is maximized 



(2.15) 
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If we define f w as before, f w (x) = /(*), and further if we set 

2 / 2 

Then the constraint on /„, in (2.15) translates to constraint on / which is 
identical with (2.14). So to solve (2.15) we find / such that 

S(/) = ci and A(/) is maximized 

Which has the same solution as (2.14). So if we find a single such function /, 
we can obtain maximal localization for any fixed signal to noise ratio by scaling 
/. Thus our choice of the composite criterion was not arbitrary but highlighted a 
natural constraint or "uncertainty principle" for detection of step edges in noise. 
We can obtain arbitrarily good localization or detection by scaling but not both 
simultaneously. 

We will find (eventually) that the above analysis is valid but that the criterion 
as given is still underspecified. While it does lead to a plausible class of solutions, 
performance will be poor because we have so far ignored an important aspect of 
the detection process. Namely the detector should not produce multiple outputs in 
response to a single edge. In the next section we find the solutions to the above 
optimization problem, and highlight their weakness with regard to multiple edge 
responses. 

2.2. The Optimal Operator for Steps 

The optimal edge detection operator has now been defined implicitly by 
equation (2.12). All that remains is to find a function which maximizes this large 
expression. We must make some simplifications before a solution can be found using 
the calculus of variations. We cannot directly find a function which maximizes the 
quotient of integrals in equation (2.12) since each depends on f(x). Instead we set 
all but one of the integrals to undetermined constant values in an analogous manner 
to the method of Lagrange multipliers. We then find the extreme value of the 
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remaining integral (since it will correspond to the maximum in the total expression) 
as a function of the undetermined constants. The values of the constants are then 
chosen so as to maximize the value of the remainder of the expression, which is now 
a function only of the three constants. Given these constants, we can completely 
uniquely specify the function f(x) which gives the global maximum of the criterion. 

The second simplification involves the limits of the integrals. The two integrals 
in the denominator of (2.12) have limits at plus and minus infinity, while the integral 
in the numerator has one limit at zero and the other at minus infinity. Since the 
function / should be antisymmetric, we can use the latter limit for all integrals. 
The denominator integrals will have half the value over this subrange that they 
would have had over the full range. Also, this enables the value of / ; (0) to be set as 
a boundary condition, rather than expressed as an integral of /". The lower limit 
of all the integrals at minus infinity should be set to some finite negative value, say 
— W since we will be dealing with an operator of finite extent. These simplifications 
allow us to exploit the isoperimetric constraint condition (see Courant and Hilbert 
1953). This allows us to combine a set of constraint integrals that share the same 
limits as the integral being extremized into a single variational equation. 

So the problem of finding the maximum of equation (2.12) reduces to that 
of finding the minimum of the integral in the denominator of the S.N.R. term, 
subject to the constraint that the other integrals remain constant. By the principle 
of reciprocity, we could have chosen to extremize any of the integrals while keeping 
the others constant, but the solution should be the same. We seek some function / 
chosen from a space of admissible functions that minimizes the integral 



L w f 2 (*) d * (2.16) 

subject to 






20 



/ 



o 2 



/'(O) = c 3 (2.17) 

The space of admissible functions in this case will be the space of all continuous 
functions that satisfy certain boundary conditions, namely that /(0) = and 
y( — \Y) = 0. These boundary conditions are necessary to ensure that the integrals 
evaluated over finite limits accurately represent the infinite convolution integrals. 
That is, if the nth derivative of / appears in some integral, the function must be 
continuous in its (n-l)st derivative over the range (— oo, +oo). This implies that the 
values of / and its first (n-1) derivatives must be zero at the limits of integration, 
since they must be zero outside this range. 

The functional to be minimized is of the form J a F(x, /, /') and we have a 
series of constraints that can be written in the form J a G{(x, f, f) = q . Since 
the constraints are isoperimetric, i.e. they share the same limits of integration as 
the integral being minimized, we can form form a composite functional ty(x, f, /') 
as a linear combination of the functionals that appear in the expression to be 
minimized and in the constraints (Courant and Hilbert 1953). Finding a solution for 
this unconstrained problem is equivalent to finding the solution to the constrained 
problem. The composite functional is 



*(*,/,/') = F(x,f,f) + XiGifx,/,/*) + XiCj(*,/,/0 + ... 
Substituting, 

*(*, /./') = / 2 + Xi/' 2 + X 2 / (2.18) 

It may be seen from the form of this equation that the choice of which integral 
is extremized and which are constraints is arbitrary, the solution will be the same. 
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This is an example of what is known as reciprocity in variational problems. The 
choice of an integral from the denominator is simply convenient since the standard 
form of the Euler equations applies to minimization problems. The Euler equation 
that corresponds to this functional is 



d 
\\f f , _ $, = o 

dx J J 

Where tyf denotes the partial derivative of ty with respect to /. This gives 



2/(x) - 2Xi/"(x) + X 2 = (2.19) 

The general solution of this differential equation is 



/(*) = ~ + <*ie ax + a 2 e— (2.20) 

Where a = \ x and the constants a\ and a^ are determined by the boundary 
conditions /(0) = and f(—W) = 0. When these constraints are added the 
function / can be written in the form 



From this we can obtain expressions for the signal-to-noise ratio and localization as 
a function of the parameters Xi and X2. To simplify the expressions we will assume 
a width W of 2 and make use of the scaling properties from equation (2.13). This 
gives 



_ 2a cosh a — 2 sinh a 

E = -===== (2-22) 

y 2a 2 cosh 2a — 3a sinh 2a + 4a 2 



. a sinh a 

A = (2.23) 

Va sinh 2a — 2a 2 
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Both these expressions are functions only of a, and we can investigate the behaviour 
of / as a tends to its limiting values and +00. As a tends to zero we find that 
function / tends to a parabola whose equation is 

/(*) = -X 2 a 2 (l - i 2 ) (2.24) 

The corresponding values of signal-to-noise ratio and localization are 

s = \H a = \H ( 2 - 25 > 

V 6 V 4 

When the value of a approaches infinity, we find that the function approaches 
a constant over the range (-2,0) (recalling that W = 2), and that the signal-to-noise 
ratio tends to 1. This is a very small increase over the corresponding value as a 
tended to zero. However, the localization term, ^, increases without bound. From 
this result it would seem that a difference of boxes function (the antisymmetric 
extension of the derived function over the range [-2,2]) gives the best possible 
signal-to-noise ratio with arbitrarily good localization. This function is in fact the 
optimal Wiener filter for the step edge. 

This operator has been used quite extensively because of its simplicity and 
because it is easy to compute, as in the work of Rosenfeld and Thurston (1971), and 
in conjunction with lateral inhibition in Herskovits and Binford (1970). However it 
has a very high bandwidth and tends to exhibit many maxima in its response to 
noisy step edges, which is a serious problem when the imaging system adds noise 
or when the image itself contains textured regions. These extra edges should be 
considered erroneous according to the first of our criteria. However, the analytic 
form of this criterion was derived from the response at a single point (the centre 
of the edge) and did not consider the interaction of the responses at several nearby 
points. We need to make this explicit by adding a further constraint to the solution. 

2.3. Eliminating Multiple Responses 

If we examine the output of a difference of boxes edge detector we find that the 
response to a noisy step is a roughly triangular peak with numerous sharp maxima 
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in the vicinity of the edge (see figure 2.2). These maxima are so close together that 
it is not possible to select one as the response to the step while identifying the 
others as noise. We need to add to our criteria the requirement that the function 
/ will not have "too many" responses to a single step edge in the vicinity of the 
step. We need to limit the number of peaks in the response so that there will be 
a low probability of declaring more than one edge. Ideally, we would like to make 
the distance between peaks in the noise response approximate the width of the 
response of the operator to a single step. This width will be about the same as the 
operator width W. 

In order to express this as a functional constraint on /, we need to obtain 
an expression for the distance between adjacent noise peaks. We first note that 
the mean distance between adjacent maxima in the output is twice the distance 
between adjacent zero-crossings in the derivative of the operator output. Then we 
make use of a result due to Rice (1944, 1945) that the average distance between 
zero-crossings of the response of a function g to Gaussian noise is 



Where R(t) is the autocorrelation function of g. In our case we are looking for the 
mean zero-crossing spacing for the function /'. Now since 



i „ 
R W = /_«, »'(*) ix ■»! «*(0) = - / Ax) dx 

J — 00 

The mean distance between zero-crossings of /' will be 

( J±Z f*{x) d*\* 

The distance between adjacent maxima in the noise response of /, denoted 
Xmax, will be twice x zc . We set this distance to be some fraction k of the operator 
width. 
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Figure 2.2. Responses of difference of boxes and first derivative of Gaussian 
operators to a noisy step edge 



= 2x xc = kW 



This new constraint adds only one term to the composite functional # since 
the integral of f' 2 already appears in \P from the localization criterion. While in 
the original functional this integral appeared in the denominator of a quantity to 
be maximized, (i.e. the localization criterion) it now appears in the numerator of 
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the mean distance between maxima, which is a constraint on the solution. It is now 
no longer clear what the sign of its Lagrange multiplier should be. This leads to 
several possible solutions for / as we shall see. The new functional is given by 

*(*, /, /',/'') = /» + Xl/ * + X 2 /'' 2 + X 8 / (2.28) 

The Euler equation corresponding to a functional of second order is 

d T d 2 

*' ~ £*' + ^*>" = ° 

When the above # is substituted into the Euler equation we get 

2/(x) - 2X 1 /''(z) + 2X 2 /""(x) + X 3 = (2.29) 

The solution of this differential equation is the sum of a constant and a set 
of four exponentials of the form e*** where 7 derives from the solution of the 
corresponding homogeneous differential equation. Now 

2 - 2X l7 2 + 2X 27 4 = 



^-^Hfe) ~£ ( 2 - 3 °) 

This equation may have roots that are purely imaginary, purely real or complex 
depending on the values of Xi and X 2 . Prom the composite functional # we can 
infer that X 2 is positive (since /" 2 is to be minimized) but it is not clear what 
the sign or magnitude of \ x should be. The Euler equation supplies a necessary 
condition for the existence of a minimum, but it is not a sufficient condition. By 
formulating such a condition we can resolve the ambiguity in the value of \ x . To 
do this we must consider the second variation of the functional. Let 
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An = f'nx,f,f',f")dx 

Then by Taylor's theorem, 



J{f + eg] = J{f] + € J l [f i g] + ±e*J 2 [f + pg ) g} 

where p is some number between and e, and g is chosen from the space of 
admissible functions, and where 



Mf> 9] = [ *f9 + */'</ + *f»9" dx 

JXn 



J M>ti = £*" g * + *>'>'*' 2 + *>">"*" 2 (2.31) 

+2V S f,gg' + 2Vff*<ftf + 2V f f„gg n dx 

Note that J\ is nothing more than the integral of g times the Euler equation 
for / (transformed using integration by parts) and will be zero if / satisfies the 
Euler equation. We can now define the second variation 6 2 J as 



6 l J = ^Hf,9] 

The necessary condition for a minimum is 6 2 J > 0. We can substitute for the 
second partial derivatives of ^ from (2.29) and we get 

fV + Xi^ + X 2 (4<te>0 (2.32) 

JXq 

which we transform using integration by parts to 



f 

/ 9 2 — >^i99xx + ^29 2 xx dx>0 

JXo 

which can be written as 
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L i 9 ~ ^2 9xx ) + ( X * - l 1 )^ dx ^ ° 

The integral is guaranteed to be positive if the expression being integrated is 
positive for all x, so if 



x 2 > -r 

4 

then the integral will be positive for all x and for arbitrary g, and the extremum 
will certainly be minimum. If we refer back to (2.28) we find that this condition 
is precisely that which gives complex roots for 7, so we have both guaranteed 
the existence of a minimum and resolved a possible ambiguity in the form of the 
solution. We can now proceed with the derivation and assume four complex roots 
of the form 7 = ±a± iu With a, u real, such that 

a 2 -a, 2 = AL and 4a V = _ X ?~^2 , , 

2X 2 4X1 [ } 

The general solution may now be written 



f(x) = a 1 e ax smux + a 2 e ax cosu>x + a 3 e- ax smujx + a A e- ax cosux + c (2.34) 
This function is subject to the boundary conditions 

/(0) = f(-W) = /'(0) = s }\-W) = 

Where 5 is an unknown constant equal to the slope of the function / at 
the origin. These four boundary conditions enable us to solve for the quantities 
eti through a 4 in terms of the unknown constants a, w, c and 5. The boundary 
conditions may be rewritten 

a 2 -|- a 4 -f c = 
28 



aie a sinu;-}-a2e a cosw + a 3 e a sina;-|-a4e a cosa;-f-c = 



a\UJ + aid + 030; — 04a = s 



aie a (a sin a; + a; cos a;) + a 2 e a (a cos u — u sin w) , . 

-f-a3e —a (— a sin a; + a; cos a;) + tMe - a (— a cos u — a; sin a;) = 

These equations axe linear in the four unknowns 01 , 02, 03, 04 and when solved 
they yield 



fll = c(a(a — a) sin 2a; — au cos 2a; -f (—2a; 2 sinh a + 2a 2 e a ) sin u 
+2au sinh a cos a; -j- a;e -2a (a + a) — aw J/4f w 2 sinh 2 a — a 2 sin 2 wj 



a 2 



= c[ a(<r — a) cos 2a; + auJ sm ^ u — 2aa; cosn a sm w — ^a; s * n k a cos u 
+2a; 2 e~ a sinh a + a(a — a) j/il w 2 sinh 2 a — a 2 sin 2 a; J 

a 3 = c(—a(a + a) sin 2a; + <* w cos 2w + ( 2a;2 sinn a + 2a 2 e a ) sin a; 
+2aw sinh a cos a; + a;e 2a (a — a) — aa; J/4f a; 2 sinh 2 a — a 2 sin 2 a; J 



fl4 — c ( _ a (a -j- a) cos 2a; — aa; sin 2a; + 2aa> cosh a sin a; -f 2a; 2 sinh a cos u 

— 2a; 2 e a sinh a + a(a + a) J/4f a; 2 sinh 2 a — a 2 sin 2 a; J 

(2.36) 

where (J is the slope s at the origin divided by the constant c. On inspection of 
these expressions we can see that 03 can be obtained from a\ by replacing a by 
— a, and similarly for 04 from a 2 . 



29 



The function / is now parametrized in terms of the constants a, u, a and c. 
We have still to find the values of these parameters which maximize the quotient 
of integrals that forms our composite criterion. To do this we first express each 
of the integrals in terms of the constants. Since these integrals are very long 
and uninteresting, they are not given here but for the sake of completeness they 
are included in Appendix I. We have reduced the problem of optimizing over 
an infinite-dimensional space of functions to a non-linear optimization in three 
variables a, u and a (as expected, the combined criterion does not depend on c). 
Unfortunately the resulting criterion, which must still satisfy the multiple response 
constraint, is probably too complex to be solved analytically, and numerical methods 
must be used to provide the final solution. 

In fact there is really no best function / for a given W because the shape of / 
will depend on the multiple response constraint, i.e. it will depend on how far apart 
we force the adjacent responses. Figure (2.3) shows the operators that result from 
particular choices of this distance. Recall that there was no single best function for 
arbitrary w, but a class of functions which were obtained by scaling a prototype 
function by w. We will want to force the responses further apart as the signal to 
noise ratio in the image is lowered, and it is not clear what the value of signal 
to noise ratio will be for a single operator. However, this design is based on the 
use of multiple widths of operator and on a decision procedure which selects the 
smallest operator that has an output signal to noise ratio above a given threshold. 
This means that all operators will spend most of their time operating close to 
their output E thresholds. We should therefore try to choose a spacing which gives 
acceptable multiple response behaviour under these conditions. 

A rough estimate for the probability of a spurious maximum in the 
neighbourhood of the true maximum can be formed as follows. Recall that maxima 
in an operator output correspond to zero-crossings in the derivative of this output. 
If we look at the first derivative of the response to an ideal step we find that 
it is approximately linear near the centre of the step. There will be only one 
zero-crossing if the slope of this response is greater than the slope of the response 
to noise only. This latter slope is just the second derivative of the response to noise 
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only, and is a Gaussian random variable with standard deviation 



• + 00 



= ^/-oo '"^H 



while the slope of the zero-crossing at the centre of the edge is Af{0). The 
probability p m that the former slope exceeds the latter is given in terms of the 
normal distribution function $ 



Pm 



--♦ m 



We can choose a value for this probability as an acceptable error rate and this 
will determine the ratio of /(O) to a 8 . Rearranging we obtain. 



A|/ ' (0)l -*- X (l-Pm) (2-37) 



no }/!+£ P\x) dx 

And we can see the explicit dependence of this constraint on the image signal 
to noise ratio. We can eliminate this dependence by relating the probability of a 
multiple response p m to the probability of falsely marking an edge p/ where we 
define 



P/ = l-*(£) 
and we have finally that 



i/'(o)i = t i±m* (2 . 38) 



y/l+Z /" 2 W dx y/j+Z P(x) dx 



where k is a constant determined by the values of the two probabilities. If we choose 
to set p m equal to pf then the value of k is one. Unfortunately, the largest value of 
k that could be obtained using the constrained numerical optimization was about 
.58. This corresponds to an inter-maximum spacing of 1.2 (in units of W). This is 
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the final form of linear operator that we will use. It is illustrated in the last of th 
series of graphs in figure (2.3). Its performance is given by the product of £ and A 
and it has the value 



EA = 1.12 



(2.39) 



Inspection of the shape of this operator in figure (2.3) suggests that it may be 
possible to approximate it using a first derivative of a Gaussian G f where 

G( X ) = ex P (-^) 

The reason for doing this is that there are very efficient ways to compute the 
two dimensional extension of the filter if it can be represented as some derivative 
of a Gaussian. This will be discussed in detail in chapter 5. We now compare the 
performance of a first derivative of a Gaussian filter with the optimal operator. The 
impulse response of the filter is now given by 



/(*) = -£«*(-£) 
and the terms in the performance criteria have the values 

l/'(0)| = 4 

J f(*)dx = 1 

J — oo 



(2.40) 



y+oo 

J — QO 



/. 



f\x) dx = ^* 
00 J 2(7 



^f'\x)dx = h^ 
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Figure 2.3. Optimal step edge operators for various values of k, from top to 
bottom they are k = 0.075, 0.15, 0.25, 0.5, 0.65, 0.7 
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J-oo i J 8a 5 (2.41) 

The overall performance index for this operator is 



EA = V S ~ °* 92 (2.40) 



While the k value for this filter is, from (2.38) 



*. / 4 

* = Vtt « 0.51 
V 15 



The performance of this operator is worse than the optimal operator by about 
20%, and ,f multiple response measure *, is worse by about 10%. It would probably 
be d.fficult to detect a difference of this magnitude by looking at the performance 
of the two operators on real images, and because the first derivative of Gaussian 
operator can be computed with much less effort in two dimensions (but see section 
5.2), ,t has been used exclusively in experiments. The impulse responses of the two 
operators can be compared in figure (2.4). 

2.4. Finding an Operator by Stochastic Optimisation 

The previous section contained the derivation of a "closed form" for an optimal 
edge detector for step edges. Even in the derivation of this closed form for the 
operator, a numerical optimisation was necessary to obtain the coefficients that 
appear in its analytic form. We saw that this method required the solution of 
very complex simultaneous systems of non-linear equations. It is likely that if the 
techmque were applied to other problems it would seldom be possible to find closed 
form solutions for the operators. However, this does not mean that a useful operator 
cannot be derived using these techniques. There are two alternative approaches 
both of which were used in the derivation of the step edge operator, and which can' 
be applied when the expressions become too complex to be solved. 

(i) The first of these was used in the previous section and involves the use of 
numerical methods for the determination of some finite number of parameter 
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Figure 2.4. (a) The optimal step edge operator, (b) The first derivative of a 
Gaussian 



values once the solution has been reduced to a parametric form. In fact 
even infinite dimensional objects, e.g. the impulse response of a filter, can be 
approximated by a finite dimensional discrete filter if appropriate constraints 
on the bandwidth (of the infinite filter) are met. All that is required is a 
deterministic criterion which can be applied to the parametric form of the 
operator and which measures the "goodness" of the operator with respect to 
that criterion. 

(ii) The second method is necessary when it is not even possible to write down 
a closed form for the criterion of optimality. This problem arises when the 
image model contains some random component (e.g. Gaussian noise) and it 
is then necessary to form criteria that reflect some meaningful statistics on 
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the behavour of operator on a„ ense m ble of images. Gaussian independent 

random presses are particuiady easy to analyse, but even with Gaussian 

stat.st.cs, the Cosed form criteria for step edges ,ed to very complex solutions 

However, .„ the further work section of this thesis we will propose a m ethod for 
trans o rming problems ^ ^ ^ ^.^ ^^ ^ ^^ 

probelms involving only Gaussian independent processes. 

In fact in the work leading up f this report, the second method was used successfu.Iy 
before a closed form solution using the first me thod was obtained. This is almost 
certamly the mh rather tha „ the except . on whue ^ ^ th£ stQchastic 

leads to an approximate solution, and may not be feasible if the parameter space 
- Poorly conditioned, it is stil, felt that it is a useful technique and may guide the 
search for an analytic solution. 

The stochastic method begins, as did the analytic method, with a model 

of the .mage. Again we consider a step edge with superimposed white Gaussian 

no,e. We seek a filter / which maximizes some criterion but in this case we 

cannot characterize the filter by its (infinite) impulse response. Instead we consider 
a d.s Crete filter L , we represent the fi]ter ^ . ts . mpu]se ^^ 

po,t.ons 0, r, 2r etc. Provided that the bandwidth of the corresponding continuous 
-pulse response filter is less than the Ny quis t frequency, JL. the continuous filter 
* comp.etely described by its discrete approximation. It turns out that for the step 
edge operators, which have small bandwidth, only about 1 2 samples are necessary 
Th,s was not known before the optimization was done and 32 samples were used 
for the discrete filter. 

The optimization algorithm is essentially a hill-climbing search over the space 
of poss.b.e filters. It proceeds by continuously iterating through the following steps 

(i) Create a (discrete) noisy edge by adding Gaussian random numbers to the 
sampled values of a step edge. 

(ii) Convolve the filter with this edge, and evaluate the response, 
(iii) Perturb the filter coefficients (sampled values) by a small amount 
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(iv) Convolve this new filter with the edge, and evaluate the new response. 

(v) Change the filter based on the effects of the perturbation in (iii). 

Note that this procedure is not guaranteed to lead to a solution even in the 
case where the analytic solution space is convex. It differs from deterministic 
hill-climbing procedures in that the "hills" (the contours of constant evaluation 
in parameter space) are not fixed but vary from iteration to iteration. There is a 
random component in any particular evaluation caused by the presence of noise in 
the modelled image. We can only say that the limit of the mean of a number of 
such evaluations will be the contours that would be obtained from the deterministic 
criteria. In fact the magnitude of the changes caused by image variations greatly 
overshadowed the magnitude of the changes due to the perturbations in the filter 
coefficients. It was therefore important to apply the perturbed and original filters 
to the same image. 

To see when this method should converge, we assume that there exists a 
deterministic evaluation function F over the parameter space, and such that we 
can locally estimate the evaluation of an n-tuple of parameters p as 

Ei = F(?) + r 

where r is a random variable from some unknown distribution which models 
the effects of the image noise. If we now perturb the filter coefficients by some small 
dp, we obtain the new evaluation 

E 2 = F(p + 6p) + r 

assuming that the value of r is constant (the image has not changed) over some 
small neighbourhood of p. If we subtract E\ from E 2 and divide by \6p\ we obtain 



Ea-Bi _ F(p + Sp) - F(P) 

\6P\ ~ \SP\ (2A3 > 
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Now 



where U is a unit vector in the direction of Sp. By using » normal perturbations 
*fc with n corresponding unit vectors 8,- and forming the sum 

^(E 2 — E 1 )6p i » 

i±i W ~ ,g( Vi "® • ^ = Vf Q») ( 2 .44) 

we have formed an estimate of the gradient of the evaluation function at the 
point j> m parameter space. Another way of forming an estimate of VF is to use 
perturbations which are randomly distributed. By randomly distributed we mean 
that each component of 6? is an independent random variable with zero mean 
and variance a\. Then the expectation value of the perturbation weighted by the 
change in evaluation is 

El(E 2 -E 1 )6pl = V F (?y p (245) 

So we can also achieve an estimate of the gradient of F by making random 
perturbations in the filter coefficients and weighting those perturbations by the 
change in evaluation. This method provides a more uniform coverage of the 
ne.ghbourhood around a single parameter space point than does a particular choice 
of orthonormal perturbations. The implementation uses random perturbations and 
a short term averaging filter to obtain an estimate of the gradient of F over several 
.teration, The filter used has a single pole (i.e. its response to an impulse is an 
exponentaally decaying sequence), and can be described by the difference equation 

_- (g 2 ,-g iy ) 0gy 
"' ^ +%-i (2.46) 

where <J, is the estimate of the gradient of / at the j* iteration, and the subscripted 
quantataes E 2j , E u and % are the values of these quantities at the ,*> iteration fi 
» a tame constant between and 1, and it determines the "inertia" of the system. 
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The algorithm performs a simple hill-climbing by taking the estimate of the 
gradient at each iteration and adding a multiple of it to the current value of p. 
However we have (necessarily) added a time constant in the estimator for VF so 
that we can obtain a continuously updating estimate while simultaneously climbing 
using the existing estimate. We now have a system with both "inertia" from the 
average over the previous iterations and a "viscosity" caused by the fact that this 
average decays with time (assuming f3 is less than 1). Therefore it is possible for 
it to overshoot a minimum, or even to oscillate several times before settling at the 
minimum. It was necessary to set the time constant empirically in order to obtain 
accurate estimates of gradient without excessive overshoot. The behaviour of the 
system is roughly analogous to a ball rolling along a contoured surface under the 
influence of gravity, with perfectly viscous drag. 

The major reason for resorting to stochastic methods for the optimization was 
that the evaluation criterion is a function of a particular response, rather than an 
estimate of the behaviour of the filter on a large set of inputs. But the abstract 
criteria should be the same. The heuristic criterion should evaluate both the error 
rate and the localizing ability of the operator. The criterion actually used is 

£ v = -5011-Twl-d^ (2.47) 

where n max is the number of local maxima that occur in a fixed neighbourhood 
of the edge, and d max is the distance of the strongest maximum from the centre of 
the true edge. Note that the two terms in the expression are "penalty" measures, 
hence the two negations. 

Figure (2.5) shows the algorithm converging to a solution after the filter has 
been initialized to a difference of boxes. In figure (2.6) the initial filter coefficients are 
random and independent. It is worth comparing figure (2.5) with figure (2.3), which 
showed the best analytic form of the operator for various inter- maximum distances. 
It seems that the stochastic method moves through parameter space in such a way 
that it passes through several of these analytic optimal forms before reaching a 
global extremum. The two methods produce similar solutions even though their 
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ntena are sh g ht,v differe „, This is strong evidence that ^ ^ rf ^ 
detector ,s robust with respect to the actual choice of criteria, so l ong M the criteria 
depend „ both error rate and locating ability. We w i„ see further evidence of 
this in the work of Shanmugam et al (1979) in a later chapter. 
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tt<RRT-HRLF-FIX-32. 5740642> 

;; after 200 iterations, perfornance index is 393 
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tt<RRT-HRLF-FIX-32 5740642> 

;; after 3300 iterations, perfornance index is 312 




tt<RRT-HRLF-FIX-32. 5740642> 

;; After 14100 iterations, perfornance index is 204 




tt<RRT-HRLF-FIX-32. 5740642> 

;; Rfter 43500 Iterations, perfornance index is 136 




»<RRT-HRLF-FIX-32. 31366670> 

;; Rfter 120,000 iterations, perfornance index is 119 



Figure 2.5. Convergence of the stochastic optimization procedure after 
initialization to a difference of boxes 
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initiStL^d°r r K of the stochastic 



optimization procedure after 
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3. Two or More Dimensions 

In one dimension we can characterise the step edge in space with one position 
coordinate. In two dimensions an edge also has an orientation. In this chapter we 
will use the term "edge direction" to mean the direction of the tangent to the 
contour that the edge defines in two dimensions. Suppose we wish to detect edges 
of a particular orientation. We create a two-dimensional mask for this orientation 
by convolving a linear edge detection function aligned normal to the edge direction 
with a projection function parallel to the edge direction. A substantial saving in 
computational effort is possible if the projection function is a Gaussian with the 
same a as the (first derivative of the) Gaussian used as the detection function. 
It is possible to create such masks by convolving the image with a symmetric 
two-dimensional Gaussian and then differentiating normal to the edge direction. In 
fact we do not have to differentiate normal to every possible edge direction because 
the slope of a smooth surface in any direction can be determined exactly from its 
slope in two directions. The simplest form of the detector uses this method. 

After the image has been convolved with a symmetric Gaussian, the edge 
direction is estimated from the gradient of the smoothed image intensity surface. 
The gradient magnitude is then non-maximum suppressed in that direction. The 
directional non-maximum suppression is equivalent to the application of the 
following non-linear differential predicate 

where n = ]y§^/[) which has the same zero-crossings as 

VS • V(V5 • VS) = (3.1) 

where S = G*I and where I is the image and G is a symmetric Gaussian. This is 
readily varified by using the substitution 
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—5 = n ' V5 _ 



dn 



n 



The fonn of „o„-,i„e ar second derivative operator used in (3.1) turns out to 

tm s i:t: pro r sed by Havens Md strikwerda ^ **» - *«* 

Seir (i983) - * - — in *«* « - — of :i 

This operator actually locates either maxima or minima, by locati„ g the 
-o-cross.ngs in the second derivative in the edge direction. In principals 
operator could he nsed to implement an edge detector in an arbitral mle 
d.n>ens.ons y first convoking the image with a symmetric n-dimensiona, Gaussian 
The convolut.on with an n-dimensional Gaussian is highly efficient because the 
Gaussian is decomposable into n linear filters. 

There are other more pressing reasons for using a smooth projection function 
such as a Ganss.an. When we a Ppl y a linear operator to a two dimensional image 
we form at every point in the output a weighted sum of some of the input values. For 
e edge detector described here, this sum wil, be a difference between ,oca. averages 
of the different sides of the edge. This output, before non-maximum suppression 
represent a Kind of moving average of the image. Ideally we would Z to use' 
an mfimte projection function, but real edges are of limited extent. It is therefore 
necessary to window the projection function (see Hamming 1 983 ). If the window 
unction , abrupt* truncated, e.g. if it is rectangular, the fi.tered i mage will not be 
smoth because of the very high bandwidth of this window. This result is anaiogous 
t^ t e Gibbs phenomenon in Fonrier theory. When non-maximum suppression is 
apphed these variations will tend to produce edge contours that Vander" or that 
in severe cases axe not even continuous. 

The solution is to use a smooth window function. In signal processing, typical 
wmdows used are the Hamming and Hanning window, The Gaussian is a reasol,! 
approx.mat.on to both of these, and it certain* has very low bandwidth for a 
,ven jaha-( The Gaussian „ ^ ^ ^ ^ 

of bandw.dth and frequency). The effect of the window function becomes very 
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marked for large operator sizes and it is probably the biggest single reason why 
operators with large support were not practical until the work of Marr and 
Hildreth on the Laplacian of Gaussian. The perceptive reader will probably see 
the similarity between these smoothness constraints in the projection function to 
preserve continuity of contours in the edge direction, and the smoothness of the 
detection function implied by the addition of the multiple response constraint. 

It is worthwhile here to compare the performance of this kind of directional 
second derivative operator with the Laplacian. First we note that the two- 
dimensional Laplacian can be decomposed into components of second derivative in 
two arbitrary orthogonal directions. If we choose to take one of the derivatives in 
the direction of principal gradient, we find that the operator output will contain 
one contribution that is essentially the same as the operator described above, and 
also a contribution that is aligned along the edge direction. This second component 
contributes nothing to localization or detection, (the surface is roughly constant in 
this direction) but increases the output noise. This will be verified analytically in 
chapter 7. 

A version of the detector which used the Gaussian convolution followed by 
directional non-maximum suppression has been implemented and performed very 
well. Examples of its output will be given in chapter 6. While the complete 
detector includes multiple operator widths, orientations and aspect ratios, they are 
a superset of the operators used in the simple detector. In typical images, most of 
the edges are marked by the operators of the smallest width, and most of these by 
non-elongated operators. However, as we shall see in the following sections, there 
are cases when larger or more directional operators should be used, and that they 
offer considerably better performance when they are applicable. The key to making 
such a complicated detector produce a coherent output is in the design of effective 
decision procedures for choosing between operator outputs at each point in the 
image. 

3.1. The Need for Multiple Widths 

Having determined the optimal shape for the operator,we now face the problem 
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of choo, lng the width of the operator so as to give the best detection/locahzation 
rade-off ,n a particular app.ication. In general the signal to noise ratio wil, be 
Afferent for each edge within an hnage, and so it wil, be necessary to incorporate 
several w.dths of operator in the scheme. The decision a* to which operator to 
use , must be made dynamically by the a.gorithm and this rehires a local estate 
of the no.se energy in the region surrounding the candidate edge. Once the noise 
energy „ known, the signal to noise ratios of each of the operators wi.l be known If 
we then use a mode, of the probabi.ity distribution of the noise, we can effectively 
calcu.ate the probability of a candidate edge being a false edge (for a given edge 
th« probab.hty will be different for different operator widths). 

Since the a-priori penalty associated with a falsely detected edge is independent 
otheedge strength.it is appropriate to threshold the detector outputs on probabihty 
of error rather than on ma gnitude of response. Once the probability thresho.d is 
set, the minimum acceptable signal to noise ratio is determined. However, there 
may be several operators with signal to noise ratios above the threshold, and in this 
case the smallest operator shou.d be chosen, since it gives the best localization We 
can afford to be conservative in the setting of the threshold since edges missed by 
the smallest operators may be picked up by the larger ones. Effectively the trade-off 
between error rate a.d .ocahzation remains, since choosing a high signal to noise 
rat.o hreshold leads to a lower error rate, but wil. tend to give poorer .ocalization 
smce fewer edges will be recorded from the smaller operators. 

In summary then, the first heuristic for choosing between operator outputs 
.that small operator widths should be used whenever they have sufficient £ 
Tins ,s similar to the selection criterion proposed by Marr and Hi.dreth (1980) 
for choosing between different Laplacian of Gaussian channels. In their case the 
argument was based on the observation that the smaller channels have higher 
resolufon, i.e. there is less possibilty of interference from neighbouring edges That 
argument is also very re.evant in the present context, as to date there has been no 
consideration of the possibility of more than one edge in a given operator support 
Interesting*, Rosenfeld and Thurston (1971) proposed exact.y the opposite criterion 
m the choice of operator for edge detection in texture. The argument given was 
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that the larger operators give better averaging and therefore (presumably) better 
signal to noise ratio. 

Taking this hueristic as a starting point, we need to form a local decision 
procedure that will enable us to decide whether to mark one or more edges when 
several operators in a neighbourhood are responding. If the operator with the 
smallest width responds to an edge and if it has a signal to noise ratio above the 
threshold, we should immediately mark an edge at that point. We now face the 
problem that there will almost certainly be edges marked by the larger operators, 
but that these edges will probably not be exactly coincident with the first edge. A 
possible answer to this would be to suppress the outputs of all nearby operators. 
This has the undesirable effect of preventing the large channels from responding to 
"fuzzy" edges that are superimposed on the sharp edge. 

Instead we use a "feature synthesis" approach. We begin by marking all 
the edges from the smallest operators. Prom these edges, we synthesize the large 
operator outputs than would have been produced if these were the only edges in the 
image. We then compare the actual operator outputs to the synthetic outputs. We 
mark additional edges only if the large operator has significantly greater response 
that what we would predict from the synthetic output. The simplest way to produce 
the synthetic outputs is to take the edges marked by a small operator in a particular 
direction, and convolve with a Gaussian normal to the edge direction for this 
operator. The o of this Gaussian should be the same as the o of the large channel 
detection filter. 

This procedure can be applied repeatedly to first mark the edges from the 
second smallest scale that were not marked by at the first, and then to find the 
edges from the third scale that were not marked by either of the first two etc. 
Thus we build up a cumulative* edge map by adding those edges at each scale that 
were not marked by smaller scales. It turns out that in many cases the majority 
of edges will be picked up by the smallest channel, and the later channels mark 
mostly shadow and shading edges, or edges between textured regions. 
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3.2. The Need for Directional Operators 

So far we have assumed that the projection fnnction i s a Gaussian with the 

HIT ; G 7T ^ ^ ^ deteCti ° n ^ * ** ** *" -tectl 

and locahzatmn of the operator improve as the length of the projection function 

increases. We now Drove ttii. f„, fk . . 'unction 

ow prove th.s for the operator signal to noise ratio. The proof for 

through the origin. This edge can be represented by the equation 

I(x,y) = Au^{y) 
where U is the unit step ^^ ^ ^ ^ ^ ^.^ ^ 

m: th r i additive Gaussian n ° ise ° f - — - * ~ un : 

a e, If we convolve this s,g„ al with a filter whose impulse 

the response to the edge (at the origin) is '' 

/0 /-f-oo 



IS 



The root mean squared response to the noise only 

The Si a. t0 noisfi ^ . ^ quot . ent Qf thfise ^ 

Td r ^ a,feady ^ ^ happens if we scale the *»— — 1 

he edge (equation 2 . 13 ). We now do the same to the projection function by 
replacmg /(., y) by f[Xf , j. The integra , s ^^ * 

U L, K*' j)d X dy = f_° x /_+" f{x> yi) t dx dn 

MCC^W - »4CCf>(^ xdyi f 
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And the ratio of the two is now vfe). The localization A also improves as 
>/?. It is clearly desirable that we use as large a projection function as possible. 
There are obviously practical limitations on this, in particular all edges in an image 
are of limited extent, and few are perfectly linear. However, most edges continue 
for some distance, in fact much further than the 3 or 4 pixel supports of most 
edge operators. Even curved edges can be approximated by linear segments at a 
small enough scale. Considering the advantages, it is obviously preferable to use 
the directional operators whenever they are applicable. The only proviso is that 
the detection scheme must ensure that they are used only when the image fits a 
linear edge model. 

The present algorithm tests for applicability of each directional mask by 
forming a goodness of fit estimate. It does this at the same time as the mask itself 
is computed. An efficient way of forming long directional masks is to sample the 
output of non-elongated masks with the same direction. This output is sampled at 
regular intervals in a line parallel to the edge direction. If the samples are close 
together (less than 2a apart), the resulting mask is essentially flat over most of its 
range in the edge direction and falls smoothly off to zero at its ends. Two cross 
sections of such a mask are shown in figure (3.1). In this diagram (as in the present 
implementation) there are five samples over the operator support. 

Simultaneously with the computation of the mask, it is possible to establish 
goodness of fit by a simple squared- error measure. Since the quantity being estimated 
to produce the mask is the average of some number of values, the squared error is 
just the variance of these values. We then eliminate those operator outputs whose 
variance is greater than some fraction of the squared output. Where no directional 
operator has sufficient goodness of fit at a point, the algorithm will test the outputs 
of less directional operators. This simple goodness of fit measure is sufficient to 
eliminate the problems that traditionally plague directional operators, such as false 
responses to highly curved edges and extension of edges beyond corners, see Hildreth 
(1980). 

This particular form of projection function, that is a function with constant 
value over some range which decays to zero at each end with two roughly half- 
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responses of several masks. g direction (c) Two-dimensional impulse 



CW.ans, is very similar to a commonly used extension of the Hanning window 
Th,s latter function is flat for some distance and decays to zero at each end with 
two half-cosine be„s (Bingham, Godfrey and T^ey 1 967) . We can therefore expect 
■ to have good properties as a moving average estimator, which as we saw at the 
start of the chapter, is an important role fulfilled by the projection function. 

All that remains to be done in the design of directional operators is the 
specficatmn of the number of directions, or e qu ivalently the angle between two 
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adjacent directions. To determine the latter, we need to determine the angular 
selectivity of a directional operator as a function of the angle 6 between the edge 
direction and the preferred direction of the operator. Assume that we form the 
operator by taking an odd number 2N + 1 of samples. Let the number of a sample 
be n where n is in the range — N . . . -\-N. Recall that the directional operator 
is formed by convolving with a symmetric Gaussian, differentiating normal to the 
preferred edge direction of the operator, and then sampling along the preferred 
direction. The differentiated surface will be a ridge which makes an angle 9 to the 
preferred edge direction. Its height will vary as cos0, and the distance of the n th 
sample from the centre of the ridge will be nd sin where d is the distance between 
samples. The normalized output will be 



_ /AX cos0 

O n {0) = 



2N + 1 



* / (nd sinO) 2 \ 

£ exp l — sH 



If there are m operator directions, then the angle between the preferred 
directions of two adjacent operators will be 180/m. The worst case angle between 
an edge and the nearest preferred operator direction is therefore 90/ra. In the 
current implementation the value of dja is about 1.4 and there are 6 operator 
directions. The worst case for is 15 degrees, and for this case the operator output 
will fall to about 85% of its maximum value. 

3.3. Noise Estimation 

To estimate noise from an operator output, we need to be able to separate its 
response to noise from the response due to step edges. Since the performance of 
the system will be critically dependent on the accuracy of this estimate, it should 
also be formulated as an optimization. Wiener filtering is a method for optimally 
estimating one component of a two-component signal, and can be used to advantage 
in this application. It requires knowledge of the autocorrelation functions of the 
two components and of the combined signal. Once the noise component has been 
optimally separated, it is squared and locally averaged. In fact we can further 
improve the separation in the smoothing phase, since when we use the noise estimate 
we will be comparing it to the response of the edge detection operator at a local 
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ma— we know that there is an edge near the centre of the detection operator 

and that th s edge will be D rnH,„m. „ t, operator, 

g WW producmg a known response in the noise separation filter 

frh no.se separate wil, not he perfect). We can nse the positiona, correspondence 

e : : T ses to make the ,ocai averagmg mter •*■— - * «*- 

an edge when there is no noise present. 

Let giix) be signal we are trying tQ detect ( . n th . s ^^ the ^^ 

-d „ („ be some disturbance (the edge response), then denote the autocorrelation 
funcfon of 9l as fl„( r ) and that of 92 as R„( T ) and their „ 
p M . it _ aa "2H r ;, and their cross-correlat on as 

Rn(r), where the correlation of two rea. functions is defined as fofiows 



%W = /^ gi{x) 9j {x + r ) dx 

We assume in this case that the signal and disturbance are uncorrected, so 
*1>M = 0. The optnnal filter is K( x) where K is defined as follows (Wiener 1949) 



*"M = LI( R n(r-x) + R 22{T - 



x))K(x)dx 



eaualtr 7 t0COr ; e,atiOn ° f the "**« ° f * «*« «- -Ponse to white noise is 
equal to the autocorrelation of its impulse response, we have 







If ?2 is the spouse of the operator derived in (2.38) to a step edge then 
wdlhave ff2 ( I ) = te Xp ^_^ t j and P g Wea 



we 



R22{x) = k 2 exp(-~\ 



4a*J 



In the case where the ampfitude of the edge is large centred to the noi M , 
fti + ftt - : approximately a Gaussian and B n is the second derivative rf . 
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The filter K above is convolved with the output of the edge detection operator 
and the result is squared. The next step is the local averaging of the squared noise 
values. The averaging filter is basically a broad Gaussian, but its accuracy can be 
improved in this application by orthogonalizing it to the step edge response. Let 
the averaging filter be expressed as A\{x) — A2(x) where 



Ai(x) = aiexp^-^J 



A 2 {x) = a 2 exp( — ^j 

and the o in the expression for A 2 is the same as for the detection filter. 
(Actually, the optimal shape for A<i is the square of the second derivative of a 
Gaussian, but the use of this function makes the scheme very sensitive to small 
variations in the position of the detection filter maximum). The constants a\ and 
a2 are chosen so that the net response to the squared filtered step response is zero, 
i.e. 

Having formed an estimate of the local noise energy at every point, we can 
now deal with the problem of setting operator thresholds to achieve minimal error 
rate. This is the subject of the next section. 

3.4. Thresholding with Hysteresis 

Virtually all edge detection schemes to date use some form of thresholding. 
If the thresholds are not fixed a priori but are determined in some manner by 
the algorithm, the detector is said to employ adaptive thresholding. The solitary 
exception is the Marr Hildreth scheme, where edges are marked at any zero-crossing 
in the output of a Laplacian of Gaussian filter. This is not a practical proposition 
because there is a very high density of zero-crossings in the response to pure noise 
even if the noise has vanishing energy. Most practical implementations of this 
scheme use thresholding based on the slope of the zero-crossing. 
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The p reS e„t algorithm sets thresholds based on local estates of i mage noise 
and there ore falls Into the Cass of adapt!, thresholding algorithms. It hasl 
a.d.t.ona, co mp le X it y that It makes use of two thresholds to deal wi th the prob, m 

ZT : and below the threshoid *»< the iength ° f *•«*•«■ 

Suppose we have a s,» g ,e threshold set at A th , and that there is an edge in the 
™age such that an operator responds to it with mean output amplitud of ^ 
There w,., be some fluctuation rf ^ ^ ^ * 

ZiiTh ?: we expect the contour to be — — - -^ - 

Itl il: t0 a br0k£n 6dge C ° nt0Ur - WMe * " ' -"""**- case, 

s -k n » a very common problem with thresholded edge detectors. It is very 
d iHcult to set such a threshold so that there is small probability of J£ 

e ges whde retaining high sensitivity. An example of the effect of threho,din g I 
hysteresis is given in figure (3.2). 

One possible solution to this problem, used by Pentland ( 1BO) with Mar, 

™r;:r ossings ' v averase the -*- ° f * — - - ~* - * 

length. If the average ,s above the threshold, the entire segment is marked If the 
average is below threshold no Dart nf tl,. * 

„„ nt . ' P * ° f the contour appears in the output. The 

contour is segmented bv brpatmo. ;♦ „* • K rc 

necessary in the , '" "^ ThiS dentation » 

necessary m the case of zero-crossings since the zero-crossings always form closed 

Conors, wh 1C h obviously do not always correspond to contours in tL image 

the thret T" T^' ^ ""* " ^ * ""*»« "*— **- 
he thr h dmg ,s done with hysteresis. If any part of . contour , ^ & 

t esho d, that int is immediate]y ^^ ^ js ^ enure cMn * 

the concur wh.ch contains the point and which lies above a low thresh ,d Th 

r Tb^T above the high th ™ - be,ow ^ - «• ^ 

t Probab ,ty „ f false edges is feduced because ^ hjgh threshoid ^ 

w.thout nskmg streaking. The ratio of the high to ,ow threshold is usuaily i n J 
range two or three to one. 
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Figure 3.2a. Image thresholded at Th\ 
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Figure 3.2b. Image thresholded at 2 Th x 
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3.2b 



Figure 3.2c. Image thresholded using both the thresholds in figures 3.2a and 
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3.5. Sensitivity to Smooth Gradients 

It has been pointed ont in Binford-Hom (1973) that images frequently contain 
slow gradients, and that edge detectors which are sensitive to these gradients are 
prone to mark multiple edges in regions where the gradient is high. The edge operator 
denved in the last chapter will be sensitive to image gradients and we should now 
-k rf rt ,s possible to eliminate this sensitivity without prejudicing performance. 
One possibility would be to use an operator which is a linear combination of 
two Afferent widths of the optima, operator, such that the resulting operator is 
msen S1 tive to gradients. Suppose the function / is given by 



/(*) = 



r<-£)~5M-£ 



Then we find that 






So this function will certainly be insensitive to gradients, and its performance 
wnl be glV en by equation (2.12). The signal to noise ratio and localization are now 



E = 



^Mvl+alffa-atf 



-.* 



[>/*\pT+A{e\ + *\a\ + o\*l + 4) - 4v /2^3„3 



A = 



£T7^f(4^ + 8*1** + 4°i)(*l - *lf i* 



Asymptotically as * 2 tends to infinity, the above 
limiting values 



expressions tend to the 



£ = 



'2*1 



1* 



v/tt. 



and 



A = 
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Which gives the same overall performance as a simple first derivative of 
Gaussian as given by equation (2.40). In practice a value of a 2 of around 3(7 1 is used 
to reduce the computational expense and to prevent the operator becoming too 
sensitive to nearby edges because of its large support. The overall performance EA 
is reduced by about 30% in this case. Note also that as a 2 approaches o\ we obtain 
a third derivative of a Gaussian, which is similar to the operator sometimes used 
to estimate the strengths of Marr-Hildreth zero-crossings. But the performance is 
reduced in this case, as will be shown in chapter 7. 

The fact that / is insensitive to gradients implies that / may be expressed as 
the derivative of a function g which has zero mean value, and which is symmetric 
because / is antisymmetric. A symmetric function g with zero mean value may be 
thought of as a lateral inhibition operator as described by Binford (1981). Lateral 
inhibition was proposed as a mechanism for reducing sensitivity to gradients. But 
the form of the lateral inhibition operator was not determined analytically when 
in fact it has a direct effect on the performance. 
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4. Finding Lines and Other Features 

Chapters two and three described in some detail the derivation of an opt im a, 
operator for step ed g es m Gaussian noise. The derivation of the anaivtic form tf 
S operator was rather tediofls ^ fa ^ ^ ^ ^ ^ ^ 

n h^d to resort to numerica, methods to find the best vaiues for the parameters. 

I he aiternafve method of finding an operator was bv a brute foree stochastic 
optun^on which did not even use an ana.vtic expression for the crite ia 
opt.ma t, The iatter method was simp.er to imp.ement, but too k muc h ,ot * 
-ve a a so.utio, ft is in theor, more genera,, because to find an operator L, 

Cerent mput wavefornii only „ edge mode] ^ ^ ^ > ' 

heen tned, and the time expenditure neces Sary to modify the stochasUc P L 
and arrive at a solution did not seem justified. 

It would be very useful if there were a more general method which gave a fast 
olutmn and was simple to adapt to new waveforms. There are several LJZ 
ons.dermg opt.ma, detectors for other features. First,,, it has been pointed £ 
(HersWs and Binford 1970) Marr 1976) that step edges are ^ ^ * fc J 

and bar profiles a S be.ng common in real images. Each of these poses a new 
problem m finding an optimal detector. 

Even if we are considering step edge profi.es, a strong case can be made for 

o iz Tt r Gauss,an noise mode,s - we can ~ *» — —- 

Z t " the Same d6Sign teChniqUe - '^ as th < «- - be 

-de, ed as the output of some fi,ter in response to white Gaussian noise. In fact 

uTa, fi7 aUt0 T ,at,0n " ^ rand ° m ~' ^ ^ —* * -iv 

causal fi, er (hnear predictor) which has the same autocorrelation. By app, ying the 

™. of this filter to tbe noi sy waveform to be detected, we obt in a dil 
waveform but it is now bathed in white Gaussian noise . We _ J^ 

d e S ,gn techniques to this new waveform, and the optima, detector Z^Z 

no n'ter. So we have mapped the problem of finding step edges in non-white 
Gauss.au no.se to that of finding other edge profiles in white noise. 
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4.1. General Form for the Criteria 

When we derived the analytic criteria for step edges in chapter 2, there were 
only two places where the form of the input waveform actually affected the criteria. 
By inserting a general feature in place of the step edge we can readily obtain a 
general criterion. Recall that the definition of the signal to noise ratio S was the 
quotient of the responses of the operator to the input waveform and to noise only. 
The response to noise for an operator with impulse response f(x) will be given by 
equation (2.4), and is 



no 



'-f-00 



/-f-OO 
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dx 
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The response of this operator at the "centre" of an arbitrary waveform F(x) is 
similar to equation (2.3) and is just 



/ +0 ° F{-x)f(x) dx 

J — 00 



So the signal to noise ratio for / and any feature F is (assuming no = 1) 



z_ £zn-*)f(*)d* 



j!±£P(x)dx 



(4.1) 



The method for determining the localization of the operator is also similar to 
that used in chapter 2, and we will not describe it fully here. Recall that localization 
was defined as the reciprocal of the standard deviation in the position of the marked 
edge relative to the true edge. To find maxima in the operator response we actually 
locate zero-crossings in the derivative of its response. Localization was defined as 
the quotient of the slope of the zero-crossing and the root mean squared noise in 
the differentiated response. The latter is given by equation (2.7) and has the value 
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The slope of the differentiated operator reponse is 
d 2 



ax 0Ji O 7r=0 



/■+«> r+oo 

L„ F ( x o ~ x)f(x) dx = y_ ^(-x)/"(z) <fz 



And so the localization A becomes (assuming n = 1) 

>//+~/«(x)rfx (4,2) 

The final form of the composite criterion can now be written as the product of (4.1) 
and (4.2) thus 



EA 



J+£ F(-x)f(x) d x J+ g F(-x)f»(x) d x 
/f^P(x)dx yJl±Zf»[ x )te (4 ' 3) 

Thus finding an arbitrary feature detector requires the maximization of this 
functional, subject possibly to some subsidiary constraints such as the multiple 
response constraint (2.25). This is difficult in general, even if the feature F is 
particularly simple, like a step edge. However the form of the functional (4.3) is 
simple enough that given a candidate feature detector we can readily evaluate its 
performance analytically. If the operator impulse response / and the feature F 
are both represented as sampled sequences, evaluation of (4.3) requires only the 
calculation of four inner products between sequences. 

This suggests that numerical optimization can be done directly on the sampled 
operator impulse response. This method can be expected to be much faster than 
the stochastic optimization since the evaluation of performance is exact, and the 
gradient at each point in function space can be accurately estimated. At the same 
time it is very general in that optimization for any waveform only requires a sampled 
version of the waveform. 

The output will not be an analytic form for the operator, but an implementation 
of a detector for the feature of interest will require discrete point-spread functions 
anyway. It is also possible to add additional subsidiary constraints by using a penalty 
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method (see Luenberger 1973). In this method the constrained optimization is 
reduced to one (or possibly several) unconstrained optimization. For each constraint 
we define a penalty function which has a non-zero value when one of the constraints 
is violated. We then find the maximum of 

E(/)A(/) - W P(/) 

Where P is a function which has a positive value only when a constraint is 
violated. The larger the value of m the greater the likelihood that the constraints 
will be satisfied, but at the same time there is a better chance that the method will 
become ill-conditioned. A sequence of values of m may need to be used, with the 
final value from each optimization used as the starting value for the next. The m 
are increased at each iteration so that the value of P(f) will be reduced, until the 
constraints are "almost" satisfied. 

An example of the method applied to the problem of detecting "ridge" profiles 
is shown in figure (4.1). The function F for a ridge is defined to be a flat plateau 
of width w, with step transitions to zero at the ends. The auxiliary constraints are 

(i) The multiple response constraint. This constraint is taken directly from equation 
(2.25), since it does not depend on the form of the feature. 

(ii) The operator should have zero DC component. That is it should have zero 
output to constant input. 

Since the optimal operator for ridges is also symmetric, it will have zero 
response to a constant gradient. This means that it can be represented as the second 
derivative of a function of finite extent, which in turn suggests that there may be 
economical ways of computing operators for several orientations in two dimensions. 

The figure shows two different operators derived for the same feature. The two 
operators differ in the size of their possible support. The second is constrained to 
lie within a region twice the width of the ridge, while the second has a support 
three times the ridge width. The performance of the second operator is very slightly 
worse than the first. However the fact that it requires a smaller support means that 
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Figure 4.1. A ridge profile and the optimal operator for it 



* » hkeiy to be less susceptible to interference from adjacent features. This aspect 
o performance depends strongly on the width of the support, but performance in 
other respects does not. We therefore choose the operator support to be three times 
the ndge width, since at this width there wil. be no interference if the distance 
be ween ridges eaua.s the ridge width, i.e. if the ridges and vafieys have the same 
width. 

Since the width of the operator is determined directly by the width of the 

ndge, there is a suggestion that several widths of operators should be used This has 

not been done in the present implementation however. With this ridge model a wide 

ndge c a „ b e considered to be two closely spaced edges, and the implementation 
a ready lncludes detectors fof ^ ^ ^ ^ ^ ^ & ^ ^^ ^ 

that there are ridges in i mages that are too small to be dealt with effectively by the 
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Figure 4.2. A roof profile and an optimal operator for roofs 



narrowest edge operator. These occur frequently because there are many features 
(e.g. scratches and cracks or printed matter) which result in discrete contours only 
a few pixels wide. 

A similar procedure was used to find an optimal operator for roof edges. These 
features typically occur at the concave junctions of two planar faces of an object. 
The results are shown in figure (4.2). Again there are two subsidiary constraints, 
one for multiple responses and one for zero response to constant input. Note that 
the difference between the two operators is essentially their "resemblance" to their 
respective inputs. We would expect this from the theory of Wiener filtering. The 
optimal Wiener filter for a signal in white Gaussian noise is just the time-reversed 
signal. Wiener filtering considers only signal to noise ratio however, and the 
localization and multiple response criteria impose effective smoothness constraints 
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on the operator. 



A roof edge deteetor ha S not been incorporated into the current edge detector 

because ,t was found that ideal roof edges were redely rare. In any case the ridge 
detec to r „ approximatjon to the jdea] rQof detectorj Md ^ 

w,t t em. The s.tuation may be different in the case of an edge detector designed 
exphctly to deal with images of po.yhedra, like the Binford-Horn line-finder (,971) 
Here several width of roof operator ma y be desirab le to deal with different sign , 
to noise ratios in the image. 

The method described above has been used to find optima, operators for both 
ndge and roof profiles and in addition it successfuHy finds the optima, step edge 
operator erived in chapter 2. It should be possible to use it to find operators for 
arb.trary features, and for optima, step operators to deal with non-white noise. For 
examp le the problem rf ^^ ^ ^ fa ^ ^ ^^ 

that has been passed through a perfect differentiator) reduces to the problem of 
detecfng roof edges in white noise. So the optima! detector for step edges in this 
case ,s the derivative of an optimal roof operator. Note that it is not the same roof 
operator which we derived here because the latter includes the zero DC response 
constraint, which does not translate to something useful for the step operator. 
4.2. In Two Dimensions 

We now face the proble m of extending the one-dimensiona, ridge operator 
to two dimensions. As in the case with step edges, the extension is an operator 
composed of a detection function norma, to the ridge direction and a projection 
functmn paral.el to it. As before we non-maximum suppress the output of the 
convolution of the image with this mask norma, to the edge direction. The maxima, 
pomts can then be threshold (with hysteresis) and the marked ridge contours can 
be combined with the edge map. 

In the case of ridges however it is much harder to obtain an accurate estimate 
o the ndge direction. There is no simple measure ,ike the gradient direction which 
ahgns with the norma,. Whi,e it is true that the larger principa, curvature wi„ be 
norma, to the ridge direction, this is a much less reliab.e quantity to measure from 
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even a smoothed image. Remember that the ridge detector will be operating below 
the resolution of the smallest step edge operator, so the degree of smoothing will 
be slight. This suggests that it will be necessary to use several oriented masks at 
each point and to choose the one which best fits the ridge locally. This has been 
found to be an inadequate solution because it performs so poorly when the ridge is 
highly curved, as is generally the case for printed text. While the highly directional 
masks have advantages for long straight ridges, they are not adequate as general 
ridge detectors. 

In practice a measure similar to the curvature must be used. The direction of 
principal curvature cannot be used directly, and not merely because it is a noisy 
measure. The peak of a ridge should be approximately flat in the ridge direction 
but highly curved normal to this direction. But there are points on the sides of 
the ridge that are approximately planar. Here the direction of greatest curvature 
will be arbitrary. It is quite possible that it will happen to be parallel to the ridge 
direction and that there may be a slight maximum in this direction, and hence a 
ridge point will be marked. 

To prevent these erroneous points from being marked it is necessary to modify 
the ridge direction estimate so that it takes into account the slope of the ridge 
normal to the direction of greatest curvature. This slope will be approximately zero 
if the point is at the top of the ridge, but for points on the side of the ridge where 
the greatest curvature may lie parallel to the ridge direction, the slope normal to 
this direction (which is the slope of the ridge face at that point) is large. So instead 
of non- maximum suppressing in the direction of maximum curvature, we use the 
direction n which maximizes 



*' + «*. 



dl 



dn 2 ' Ldn±J 

where n-L is the normal to n, and a is some positive constant. 

So in regions of low curvature, the above measure chooses a direction in which 
the slope is large, which is the correct behaviour for points on the sides of a ridge. 
This method has been used and seems to behave quite well even on difficult ridge 
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data e. g fi„e,y printed text . M ^ rf ^ ^^ ^ 

is given in chapter 6. ""<*ges 

A second prob.em with using non . maximum ^.^ ^ ^ 

t s resp :r r an ideal ridge has tw ° side ,obes ° r ° pposite - * ^ -» 

eak. These wfi, lead to negative ridges or ^ ^ ^ ^ ^ 

I! a : V1C r erSa " lD ^ thCre Wi " alS ° ^ maXlma '" ^ *» Sector 
output on ei the r s,de of a step edge, and clea rly these points shouhi not be ma rked 

r: f m - StarUng t0 ™ iDt0 - -— - — t ing descHptJ 
fferent fcaWres, which ,s much more difficu.t than the intention of daU about 
hesame kmd of feature f r o m different operators. T ypi cal features in an image wi„ 
fcad to responses from severa, different kind of feature detector, and some decision 
must be made as to which feature best represents the image. 

The question of feature integration was addressed in so me detai. by Marr 
7 ) wo stressed the importance of producing a sing.e coherent representation 
of mtns.ty changes caUed the "Prima, Sketch". This incorporated descriptions of 
severa, krnds of feature inCuding edges and wide and thin bars. The development 
of effect™ a,gorithms for the combination of arbitrary features was not carried 
very far by Marr, who rather apphed a se.ection criterion to the feature detector 
outputs at each point. In practice the responses rf ^ ^ 

e esact.y coincident in space, and it is not dear how the se.ection criterion is 
t be modified. In cases Hke these surface fitting is a usefu, technique as in the 

topograph, prima, sketch" of Harafick (1983). Here each possibie interpretation 
has associated with it an error measure that mirrors the difference between the true 
■mage and the modefied surface. The best feature to describe each ima g e point i, 
the one with lowest error. 

The probiem of integrating the ridge description with the edge detector output 
as not been satisfactory soived to the present time. A generation of the 
feature synthesis approach described in section 3.1 has been indented and gives 
acceptab,e resuits on some images, but is not robust enough to be used genera,* 
In t e case of combination of operator outputs, preference was a.ways g iven to the 
smaher operators, and the feature synthesis a,ways proceeded in one direction For 
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merging feature descriptions, where there is no a priori reason to prefer one feature 
to another, it should be symmetric. 

For example, suppose a ridge detector and a step edge detector both respond 
to a feature that is roughly a step edge. From the edge points marked by the 
edge detector, we synthesize the ridge detector output that would have occurred 
had the image actually contained a step edge. Then the ridge detector output is 
compared with the synthetic output and if it is not significantly greater, no ridge 
point will be marked. Similarly from the ridge detector output, we reconstruct the 
edge detector output that would have occurred if the ridge detector accurately 
described the image. The edge detector output will (in this example) be much 
stronger than the synthesized output and so an edge point will be marked. This 
method has the advantage that it is possible to mark the occurrence of more than 
one type of feature at a point in an image. It has been found that such points do 
occur in images (Herskovits and Binford 1970) in particular roof edges are often 
superimposed on step changes and "edge effects" which are similar to ridges also 
often accompany step changes. 

It is necessary to consider ridges and valleys as different features as the detector 
may output both kinds of interpretation near a single feature in the image. Some 
form of integration technique such as feature synthesis or goodness of fit testing 
must be used. This has not been done in the present implementation, which only 
integrates one of these features with the step edge map. This is one area where a 
lot of work still remains to be done, and several feature integration techniques need 
to be tried. 
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5. Implementation Details 

The ultimate test of any edge detector is its perforce in some Ration 
on rea , mage , The transiation rf . ^ ^ ^ ^ ^ ^ ^ 

Wh,,e he runnmg version of the program is actna lly very sm a„, it is stil, the result 
of much refinement. The refinements are as much a part of the design of the edge 
de eetor as „ the theoretica, analysis presented in the first four ehapter, It is 
not the mtent of this chapter to describe any such program. It will describe in an 

b' 7 rr Tal a ' g0rithmS ^ imPlementing """ ° f the """-* "**■■ 
by the edge detector. The author feels that this is necessary for severai reasons 

0) Smce the edge detector involves a considerable amount of computation, especial.y 
convolution, it is important that efficient algorithms be used if it is to run in 
a reasonable amount of time. 

(ii) Because images may contain very fine detail it is i mportant that loca , ^.^ 
mvolve the minimum number of pixels, but provide the best accuracy from 
hem. Tins applies in particular to operations such as the calculation of 
d.rectio„al derivatives and non-maximum suppression. 

(Hi) Edge detectors are not vision programs. The implements should bear in mind 
that the detector is only the first stage in a much larger system, and should 
g.ve consideration to the choice of representation of its output. 

This chapter will not describe in detail ali the aspects of the present edge 
detect h eme , jn fact some Qf ^ a , gorithms presented ^ ^ ^ ^ 

of the scheme, but may be used in future implementions. Instead it will focus 
on two or three of the more critica, aspects of the implementation, in particu.ar 
on efficent methods for convolution with Gaussians, and on the detaUs of the 
non-maximum suppression scheme. It wi.l close with details of a control abstraction 
for the programming of local parallel operations which are used extensively by the 
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5.1. Effects of Discretization 

All of the analysis to date has assumed that the image was a continuous 
differentiate surface and that the edge operators were likewise continuous functions. 
Since most implementations of the detector on digital computers will employ 
discrete filters applied to sampled image data, it is necessary to find accurate 
discrete approximations to the continuous filters. It is also important to consider 
the effect of the smoothing filter (if any) which was applied to the image before it 
was sampled. Smoothing filters are necessary to prevent aliasing of high frequency 
components in the sampled signal. Suppose an image is smoothed, sampled and 
convolved with a discrete filter. By the associativity of convolution, this is equivalent 
to convolving the image with a filter that is the result of the convolution of the 
smoothing filter and the discrete filter. If the image can be smoothed with a 
Gaussian smoothing function, no convolution is necessary at that scale. 

The simplest way to approximate a continuous filter with a discrete filter is to 
sample the former. If this method is to succeed, we must ensure that the sampling 
does not introduce aliasing. Consider a continuous first derivative of Gaussian filter 
of the form 

/(*) = -J ex p(~y 

Suppose that the image is sampled at intervals of r, (the filter must also be 
sampled at this rate for discrete convolution) then the Nyquist frequency is \. The 
effective bandwidth of the filter should be less than half this frequency to prevent 
aliasing. The Fourier transform of this filter is 

F(u) = \/27ria;exp( —J 

The cutoff frequency is f , and substituting we find that the amplitude at cutoff is 

V^£exp(-^-) (5.1) 
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Th,s funct.on never reaehes zero amplitude for large u , but it does approaeh 
zero very rapidly. We can set the effective cutoff frequency at the point where this 
function fa „ s to 0.01 of its maximum value. This luniU the smallest value of , that 
we can use for a given sampling rate. The minimum va.ue of „ is approximately r 
Th 1S ,s ,n fact the sma.lest operator size used in the present implementation. 

A second problem arises when we try to approximate infinite Gaussians with 
finite impulse response filters. Once again we can exploit the fact that the Guassian 
decays to zero very rapid.y in the spatial domain. In the current implementation, 
the Gaussian is truncated at about 0.001 of its peak value. This constrains the 
ratm of the number of samples in the (discrete) impulse response to the width „ of 
the filter. This ratio is typically 8, e.g. to approximate a filter with a a of 2 it is 
necessary to use at least 16 samples. 

Finally, it should be mentioned that it is sometimes possible to dispense with 
the convolution step entirely. If the desired value of „ is much less than r, discrete 
convolution is not practical. However, an equivalent convolution may be performed 
with a continuous filter (the smoothing filter) before the image is samp.ed. Since 
» theory the smoothing step is necessary anyway, no extra computational effort is 
required. We find in fact that the smoothing function is determining the performance 
of the subsequent edge detector, and the use of Gaussian smoothing should give 
near optimal step edge detection. 

5.2. Gaussian Convolutions 

It is perhaps surprising to see an entire section devoted to what seems 
a very stra.ghtforward and specific computation. However, there are several 
interesting properties of the two-dimensional Gaussian that suggest fast a.gorithms 
for convolution. In particular, the central limit theorem implies that repeated 
convolution with any finite filter tends in the limit to a Gaussian convolution We 
begin with a review of discrete convolution. 

5.2.1. Discrete Two-Dimensional Convolution 

The °^P«t°ftheconvolutionofadiscreteimage/(n, m) withatwo-dimen S iona. 
niter /(», j) , s g i ve n by the double summation 
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0(n,m) = £ £ I(n — i,m — j)f{i,j) 

assuming that the filter / has the same size M in both dimensions. This method 
requires M 2 multiplications and slightly fewer additions for each output point 
computed. It is a general method and will work with any two-dimensional finite 
impulse response filter. 
5.2.2. Convolution using One-Dimensional Decomposition 

We now consider a more specialized form of convolution which is applicable to 
a limited subclass of two-dimensional filters. The subclass is the class of separable 
two-dimensional filters. This class is characterized by the decomposition of their 
impulse responses into independent linear filters 

/(•',;■) = fx(i,o)*f y (o,j) 

where the * denotes convolution, and the filters f x and f y have only M non-zero 
components. By using the associativity of convolution, we break the convolution of 
an image with the two-dimensional filter into two convolutions with linear filters 

/*/ = I*[f x *f y ] = [I*fx]*fy 

This method requires only 2M multiplications and about the same number of 
additions per point. For large operator sizes (values of M of 64 are common) this 
method is substantially faster than the naive method, but is limited to separable 
filters. The number of useful members of this class is actually quite small, and 
the Gaussian is in fact the only two-dimensional symmetric function which can 
be decomposed in this way. Other useful separable functions include the first 
directional derivatives of the Gaussian in the x and y directions. 

5.2.3. Recursive Filterring 

So far we have been approximating the infinite Gaussian function with finite 
impulse response filters. It seems that the approximation could be more accurate 
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if we were to use infinite impulse response (recursive) filters instead. We can again 
make use of the separability of the two-dimensional Gaussian, and can therefore 
reduce the filter design problem to that of designing a one-dimensional filter 
winch approximates a Gaussian. The infinite impulse response (IIR) filter can be 
characterized by the equation 

Z P 

y(n) = £ ai x{n - i) -f £ b jy (n - j) ( 52) 

where x(n) and y(n) are the input and output respectively at the n'* point. This 
filter is ronghly equivalent to a continuous filter having P poles and Z zeros. The 
positions of the poles are determined by the coefficients 6, while the zeros are 
determined by the a,. 

The immediate drawback of using snch a filter to approximate a Gaussian is 
that the filter is infinite "in one direction only", and that the Ganssian has an 
impute response that extends to infinity in both directions. The solution to this 
problem is illustrated in figure (5.1). We employ two recursive filters moving in 
oppose directions, each of which has an impulse response which is approximately 
a half-Gaussian. We then sum the two responses (and subtract a component at 
the centre point which is doubled) and are left with a first approximation to the 
symmetric Gaussian. We can if we wish repeat this process on the filtered image 
and (by the central limit theorem) we will obtain a very close approximation to the 
Gaussian, as shown in the last frame of figure (5.1). 

The half-Gaussian is approximated by a damped exponential cosine, which 
reqmres two poles and two zeros in the recursive filter. The 6 coefficients are derived 
by considering four discrete output values near a zero-crossing of the response. We 
choose the values to be 



exp(ar)sin(-*r) exp(-ar) sm(cr) exp(-2ar) sin(>r) 

Application of equation (5.2) to the first three and last three values gives two 
equations each of which involves only one of the 6 y ,and the solution is 



74 



MM 



a 




-#AjMa*0«* 



•-- ze it » at ita i» 



W1H1NtNtMm2HmWi»W3HNI1M<l» 



W H6« «« 611 




-t<MHIMl^««P> , ^« 



■w •• "W 



%, — m — jm-^a — m r m zn zm n* m v m an vm w> u* m * m >m wt >wt srr 




S5 m 5 «S w wi — in — waiai<i a iai ) inHi i w » iwwww<w — rnr 



Figure 5.1. (a) and (b) Recursive Half- Gaussian filters moving from left to right 
and from right to left respectively, (c) sum of these, (d) Result of two applications 
of recursive filter and (e) True Gaussian. 
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h = 2exp(ar)cos(o;r) 6 2 = — 



exp(2ar) 



where a and u are the decay constant and angular frequency respective*, of the 
damped exponential cosine response. Typical values for a and w are 



Q ~ - « = 0.8a (5 3) 

where a is the standard deviation of the equivalent Gaussian. The a,- determine 
the gain of the filter and its first derivative at the origin. For unit gain and best 
approbation to the slope of a Gaussian we use 



-0 = 1.0 a^expp^-ft, 



(5.4) 



The mteresting feature of this method is that its complexity is independent 
of a. n fact for a single pass approximation, it requires only 12 multiplications 
and add.tK.ns per point (3 each for filtering in f our directions). It also requires an 
extre^ small number rf ^ ^^ ^ ^ ^ ^ ^ ^^ ^ 

further by saving previous x and , values in registers (only three are needed) It is 
possmle to implement the algorithm using only 4 references per point. In practice 
:t , usual* better to make two passes over the image, so the above figures should 
be doubled. 

This particular method is of course even more specialized than the previous 
methods, and is on.y useful for Gaussians and certain other infinite functions 
However it has the .owest complexity of any algorithm discussed, and is very 
economical with regard to memory ^^ ft ^ ^ ^.^ ^ 

the fi,ter size can be varied by simply changing the value of some parameters, 
w.thout affecting execution of the a Ig orithm. It would seem to be the first choice 
for any future implementations. 
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5.2.4. Binomial Approximation 

In the last two sub-sections we saw that by using the special properties of 
Gaussians we were able to reduce the number of multiplications required to perform 
convolution. The resulting algorithms were no longer general convolutions but were 
restricted to subclasses of two-dimensional filters. It is possible by exploiting the 
full power of the central limit theorem to produce an algorithm that requires no 
multiplication at all. Recall that repeated convolution with any spatially limited 
filter tends to an equivalent Gaussian convolution. If we choose the filter to be 
addition of two adjacent points, and if we repeat the addition many times, we can 
obtain a Gaussian approximation without multiplication. 

A useful analogy to this method exploits the equivalence of discrete convolution 
and polynomial multiplication. The filter produced by the addition of two consecutive 
samples is isomorphic to multiplication by the polynomial (x -\- 1). If the filter is 
applied n times it is equivalent to multiplication by the polynomial (x -j- l) n . The 
coefficients of this polynomial are given by the binomial theorem. We then use 
the fact that for large n, the binomial distribution may be approximated by a 
Gaussian. This method is economical in terms of multiplication, but its complexity 
is relatively high. Since the variance of two distributions add when the distributions 
are convolved, the standard deviation a only increases as the square root of the 
number of convolutions. The value of a after n applications of the addition filter is 

To phrase this result in the terms used in the rest of this section, we find the 
number of samples M required for an equivalent discrete convolution. Since M is 
typically 8cr, the relationship between the number of additions per point n and the 
equivalent mask size is 

n = — M 2 
64 

The overall complexity of this algorithm for two-dimensional convolution 
(assuming it is used with decomposition) is §%M 2 additions per point, with the 



77 



num er of memory references being roughly the same . Thls ^ 

-al, number, but thc expo „ ent , hjgh The sma)iest va)ue ^ m ^^ ^ e ^ T 

o b^ used ,s 8 , and so the minimum ^^ of addjtions ^^ ^^ 
18. However both the num ber of additions and the number of memory references 
g rows with M> even though we are exploiting separability. Since the tiL re.uil 
far floating po.nt addition is often not mU ch .ower than the time for a mu tiply 
this method rapidly becomes unattractive as M increases. 

5.2.5. Fast Convolution 

In recent years, very fast algorithms have been developed for integer or 
o.ynom.a, muIti lication ^^ ^ ^ 

TT:r {U) iD ^ ' ength ° f ^ "*— ^ »*«■ in L cas 
convo ution we typically use a filter that is m uch shorter than the length of 
*e mput, an we hould therefore expect ^ asymptotic tjme ^ 

* be n ependent of the filter length. Attempting to achieve anywhere near the 
asymptofc comply however, would introduce prohibitively high constants. 

But afl is not lost. By using the simplest form of fast multip.y, we can gain 

a very useful speedup with relatively low overhead Consider ♦ 
n„™k<, u- i. overneaa. Consider two sequences of 

numbers wh.ch are to be convolved. We assume w.l.o.g. that the length of the 
fences ls 2n , ^ we ^ ^ ^ ^ ^ ^^ ^ 

Le e wo fences be . and « and denote the subsequences as Xl , x , Md , 



x = 



-xi\+x 2 and y = yi \ +yi 

where ;» d -^'eft-shirtingofthe Seq uenceby„.Wethenusethedistributivity 
of convolution over addition, and we have 



»» = (*^+*,)*( yi X + y 2 ) = ^.U+^. + .^Ju,,, 
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From which it would seem that computation of a 2n length convolution requires 
4 convolutions of length n, implying an order of growth of n 2 . But the above 
expression can also be written as 



X * y = xi * 2/1 *l2n + [(*1 ~ *2) * {y2 — Vl) + *1 * Vl + *2 * 2/2)1 1n +*2 * 2/2 

Which requires only 3 convolutions of length n, plus some extra addition 
and subtraction. We can recursively apply this technique to compute these shorter 
convolutions, and we arrive at an algorithm with lower order of growth than n 2 . 
This leads to recurrence relationships for the number of multiplies and additions 
to perform a convolution of length n 



C m [n] = 3C m 



C a [n] = 3C 



n 

L2J 



+ 4n 



where C m is the number of multiplications and C a the number of additions for 
an n-point convolution. When these recurrences are expanded and simplified we 
obtain for the multiplication and addition complexity 

C m [n] = 3 L w 3 log2(n) = nJSi « n L6 



C a [n] = 6 . 3 L — 8 . 2 L -f 2 » 6n 



1.6 



for large n 



(5.5) 



where L is the smallest integer greater than or equal to log 2 (n). To translate 
these results into the context of convolution of a long sequence (say length n{) 
with a much shorter one (length TI2), we assume n = 712. We will require ^ 
such convolutions, and each one will require about nj* 6 multiplies. The resultant 
complexity is ^nj* 6 = rcing' 6 , so we find ^ na ^ the complexity is linear in the 
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■eng h of the longer seqU£nce but ^^ var . es ^ th£ ^ 

shorter se q uence. Por two-dimensional convolution) ^ ^ 

is about six times this. "'piexuy 

Thus we have the remarkab.e result that this method has almost the 

::::; y : convolution with — d — u, b tu 

izjtrr r: dimensionai mask - The aig ° rithm has b - **—« 

and e r y tests md.cate that it starts to exhibit its reduced complexity over naive 
onvolutmn at about „ = 1, At n = 1024> the speedup ^ ^ J < ™ 
be .mplemented m 6„ space , and the numbef rf 
same as the number of additions. 

con M- l0W T ° f ^ C ° mP,eXity C ° DStantS make this ™*W ^ster than 

~ r ? g fast Fourier transforms f ° r ^ ° f » - - — -- 

(cau mn th,s number ,s very hardware-dependent). It is also somewhat easier to 

:: ; ;; e ; he t T since - * has a -— ■«- — «- * * •*. c 

The I T ^ to geDeral C ° nVOlUti0n f ° r " fa *« »■» 16 to 16000 
2 7 T " P0SSib ' e t0 USe ***»**»* -*• «* have exact, y £ 

™ e !;: r: detection ° perator - rath - *- g — -— . 

wmie the fast convolution a gorithm has not w k- • 

rW. f ., . S not yet been incorporated into the edge 

detector, it is certainly worthy of further experiment. 

5.2.6. Sub-Summary 

Hopefully this section has hidilie-htpH fK« *w *k *. *i_ 
• f ... d5ni e nil gnted the fact that there are frequently manifold 

,ere S tm,,m P ,emen^ 

has another purpose. When t ry in g to soive a vision problem the first consideration 
houM e mot.vationa,, i.e. what shou.d this a, g orithm compute. The second 
£-** e, what can be computed from an ima g , Onl y after these twT hj 
bn treated should there bean y consents imposed b y tractabi,it y , i.e. Z 

n be computed in reasonable time. The choice of a. g orithm sho ud no! b 
Pjeiud-ced b y considerations o, emcie„c y until much consideration has been g iv!n 
o ^mentation, and on, y „ it seems ,i kely that no efficient a, g orithms 2 
^ the computation. This theme is characteristic of the wor k of Marr (1 97G) I 
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argued strongly for a breakdown of image processing problems using the above 
considerations. 

5.3. Non-Maximum Suppression 

The optimal edge operator was derived under the assumption that edges 
would be marked at maxima in its output. For two-dimensional images, finding 
these directional maxima is straightforward but there has been quite a bit of 
experimentation with various non-maximum suppression schemes. The operation 
should be as local as possible i.e. it should rely on pixels that are close to the 
potential edge point, but it should also be robust and accurate. 

The non-maximum suppression scheme described here may be used in either 
of two ways. In the first instance the edge direction is estimated from the gradient 
of a Gaussian- smoothed image surface by simply differentiating in the x and 
y directions. The gradient magnitude is then non- maximum suppressed in the 
gradient direction. This is just a possible implementation of equation (3.1). In the 
second case, the algorithm is used for non-maximum suppression of the outputs 
of directional masks. Here the gradient direction is fixed and is a property of the 
operator. We again non-maximum suppress the gradient magnitude, which in this 
case is the magnitude of the response of that operator. 

In either case the algorithm is the same. It uses a nine-pixel neighbourhood as 
shown in figure (5.2). The normal to the edge direction (either the gradient or the 
prefered operator direction) is shown as an arrow, and it has components (u X) u y ). 
We wish to non-maximum suppress the gradient magnitude in this direction, but 
we have only discrete values of the gradient at points Pij. We require three points 
for non-maximum suppression, one of which will be P XtV and the other two should 
be estimates of the gradient magnitude at points displaced from P XtV by the vector 
u. 

Now for any u we consider the two points in the 8-pixel neighbourhood of P XtV 
which lie closest to the line through P X)V in direction u. The gradient magnitude 
at these two points together with the gradient at the point P XiV define a plane 
which cuts the gradient magnitude surface at these points. We use this plane to 
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Figure 5.2. Support of the non- 



maximum suppression operator 



locally approximate the surface, and to estimate the value at a point on the line 
For example, in figure (5.2) we estimate the value of a point in between P, y+i and 
P*+i*+i that lies on the line. The value of the interpolated gradient i. 



IS 






Similarly the interpolated gradient at a point on the 



opposite side of P %%y is 
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U U V — u * 

G 2 = -*G(*-1,»-1) + G(x,y-1) 

Uy Uy 

We mark the point P X)V as a maximum if G(x,y) > Gi and G(i,y) > G2. 
The interpolation is similar for other gradient directions, and it will always involve 
one diagonal and one non-diagonal point. In practice, we can avoid the divisions 
by multiplying through by u y . 

This scheme involves five multiplications per point, but this is not excessive, 
and it performs much better than a simpler scheme which compares the point P x>y 
with two of its neighbours. It also performs better than a scheme which used an 
averaged value for the gradient along the edge, rather than just the value at P X)!/ . 

5.4. Mapping Functions 

We close this chapter with a brief discussion of a general approach to program 
structure for image processing algorithms. The development of a low-level vision 
program requires many repetitive operations on each point of the image. There 
will be some set of dependencies between the results of these computations, which 
implies that there is a natural (partial) ordering of computations, and that some 
intermediate results must be computed and saved somewhere. Aside from any 
hardware (or object code) considerations, there are two basic ways of implementing 
a software interface for this kind of processing. 

The first and most obvious way is to provide a set of primitive array operations, 
such as addition or convolution, which take arrays as arguments and store results 
into arrays. Programs written using these primitives look like normal sequential 
assembly code, unless there is some complex function (built from the primitives) 
which must move over the array in a non-standard way. In this case the arguments 
describing the way the function is to be moved must be repeated in each call to a 
primitive. This leads to cumbersome and non-orthogonal code. 

The second approach is to provide a mapping function which takes a local 
image processing function as an argument and moves it over some number of arrays, 
with the motion specified by other arguments. The two operations of mapping 
and local computation are now handled by separate functions. This method is 
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rem.mscent of the MAP- functions in Lisp, (Moon, Stallman and Weinreb 1983) 
and ,s s.milar in p hil osophy to the concept of iterators in CLU (Liskov et al 1979) 
It has a number of advantages at the user level. The first is that the local function 
can be written in the source language e.g. Lisp (eV en if the mapping f unct ion 
turns ,t mto something quite different) and tested on a sample set of arguments 
w.thout having to generate test images. The source code is more compact and' 
less error-prone, and subjectively more readable. If parallel processing hardware is 
avadable, the source code would compile into something like the code using the 
first method. There would be no significant differences in the execution efficiencies 
of the two methods. 

For a serial machine though, there are concrete advantages in directly 
implementing something like the second method. Since the local function is 
evaluated completely at one point before moving, to the next, there is only one 
storage location required for each intermediate result. This contrasts with the 
former method which needs a block of storage for each intermediate result The 
mtermediate values may be held in high-speed storage (registers or cache) which 
greatly reduces the number of memory accesses required to apply the function to 
a full unage. There is also a very considerable speedup possible when the local 
unctmn uses conditional branching, such that the time to process an image point 
depends strongly on the data values at that point. For (most) parallel machines 
the tl me to apply the function to n points will be the worst case time for a 
one-pomt application. Conditional branching must be accomplished by setting a 
non-execution flag for the length of the code to be skipped. No advantage can be 
taken of sparse data, or computional shortcuts such as recursive filtering. 

It is the author's experience that the latter style makes code development much 
ea S1 er (th.s was a serious consideration in the implementation of the algorithm 
much of which is written in Lisp machine microcode). This is true to the extent' 
that some of the more complicated functions, such as the sparse directional masks 
could probably not have been implemented using the first approach because of the' 
sheer amount of code. The edge detector has been implemented partially using the 
first approach, and fully using a mapping function (which is itself microcoded) The 
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mapping function maps over any number of arrays, with arbitrary increments for 
each array, and can store results into several output arrays if the function being 
mapped returns multiple values. The difference in execution times between the two 
methods is small, but the second method uses much less array storage, and has a 
much shorter source. 
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6. Experiments 

It has been s tre SS ed that edge detection is only the first stage in a vision system 
and that the performance of the detector can only be gauged in its context. It has 
also been argued that the requirements of many of the later modules are similar 
to the extent that it should be possible to design a detector that wili work wel. in 
several contexts. Starting from this assumption we proceeded to design a detector 
based on a precise set of performance criteria which seemed to be common to these 
later modules. We saw in chapters 2 and 4 that it was difficult to capture exactly the 
mtu.fve" criteria that we originaUy defined. It is virtually impossible to capture 
all of the desirable properties of edge detection in a finite set of criteria and in 
the final analysis the only valid criterion is the performance of the detector on real 
data. Th,s chapter is concerned with evaluating performance at the experimenta. 
level, and will include comparisons with some other edge detection algorithms. The 
evaluation is in three stages: 

(i) Validation of the analytic performance criteria. The operator has been designed 
to optimally detect step edges in Gaussian noise. It should perform wel. on 
synthetic images of steps. 

(ii) Subjective evaluation of performance on real images. The intention here is 
to venfy the operation of various parts of the algorithm, in particular the 
mtegration of different operator widths and orientations. It is not possible 
to vahdate these by inspection of the detector output, but defects in their 
operation can often be isolated in this way. That is, we cannot tell by looking 
at the output whether the doctor is working perfectly, but we can often tell 
where it is failing. 

(hi) Evaluation of detector performance in some context. Since an edge detector 
■■ the first stage in many vision programs, it is appropriate to compare edge 
detectors by comparing the performance of the program which uses the two 
detectors. If the original assumption that many vision modules have simila, 
reqmrements is valid, a detector designed using these criteria should perform 
well with all modules. 
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Finally we will present some simple demonstrations of properties of the human 
visual system which are consistent with the edge detector presented here. While 
this does not prove that the human visual system performs the same computations, 
it does suggest that the two systems share a common set of goals. It reinforces the 
choice of performance criteria and gives evidence that an edge detector designed 
using these criteria can perform well on a great variety of images. 

6.1. Step Edges in Noise 

In chapter 7 we will demonstrate that a directional second derivative edge 
operator gives better localization than the Laplacian when applied to a Gaussian 
smoothed image. We have also claimed (chapter 2) that a difference of boxes operator 
gives unacceptable multiple response performance to a noisy step edge. We should 
test those results experimentally now. In figure (6.1) We have a two-dimensional step 
edge with additive white Gaussian noise. The successive frames show the responses 
of difference of boxes, Laplacian of Gaussian and directional first derivative of 
Gaussian operators. The signal to noise ratio of the image, defined as the ratio of 
the amplitude of the step to the standard deviation of the noise at each pixel, is 
about 0.2, and the image is 256 by 256. The Gaussians for the Laplacian and first 
derivative operators both have a a of 8.0 pixels, while the box masks have a length 
of 32 pixels in x and y. 

Processing of the output of the difference of boxes operator after the convolution 
step is identical with the directional first derivative operator, that is, it is 
non-maximum suppressed and thresholded with hysteresis. For the Laplacian of 
Gaussian, the sequence is slightly different. Edges are initially marked at the 
zero-crossings in the convolution output, then the gradients of the convolution 
output are computed and the gradient magnitude is thresholded with hysteresis. 

For each operator there are - two thresholds that are set empirically to give the 
best subjective output. Figure (6.1) gives a guide to the localizing and multiple 
response performance of each of the operators. As we would expect, the directional 
first derivative operator gives subjectively better localization than the Laplacian, 
and the difference of boxes produces several contours in response to the single edge. 
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Figure 6.2. Effect of changing operator thresholds. In the first row, the 
thresholds are increased by 50% , and in the second row they are reduced by 50%. 
The order of operator outputs is (from left to right), (i) Laplacian of Gaussian, (ii) 
difference of boxes and (iii) first derivative of Gaussian. 
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To gaill so ,ne idea of the detection performance (i.e. sigIlal to noise ratio) rf 

he operators, we can see how their outputs vary as we change the thresholds f rom 
t e o pt va)ues . The ^ ^ of ^^ (6 ^ shws the resu]t ^ ^ 

a.! thresholds h y 50%, and in the second row the thresholds are reduced from 

rom the optimum levels by 50%. From these we can infer that the difference of 

boxes operator has the best signal to noise ratio, since for all threshold levels it 

marks edges only in the vicinity of the step. Of course signal to noise ratio is only 

one component of detection performance, and .ack of multiple responses is the 
other. , n t , respect the diffefence Qf boxes performs ^ poor]y) Md ^ 

the thresholds are lowered. The Laplacian of Gaussian exhibits poor signal to noise 

rat,o compared to the other two, and it is not possible to set the thresho.ds so that 

the full length of the edge contour is marked without introducing contours due to 
noise. 



It may be argued that the problems with Laplacian of Gaussian or difference 
of boxes operators can be circumvented by app.ying >unin( ,> heuristics to ^ 
outputs. For examp.e, it maybe argued that it is possible to eliminate erroneous 
^ " the diffefenCe ° f boxes «** that are W the edge, or to use the 
outputs of different Laplacian of Gaussian channels to reinforce the evidence of 
an edge. This argument misses the point. The optima, operator derived here, or 
the first envative of Gaussian approximation to it, gives the best performance for 
a „»* hnear operator when followed by non-maximum suppression. In order to 
-Prove the performance of the other operators, non-loca. predicates have to be 
apphed wh,ch in a sense make the filtering step redundant. 

6.2. Operator Integration 

We have argued that in order to handle a variety of images, an edge detector 
hould mcorporate operators of different widths. We have also argued for highly 
d.rect.onal masks when they are applicable. All of these operators respond to the 
-me type f feature, a step edge, and where several of them respond to the same 
edge^he detector must mark a single edge only. The problems with the integration 
of Afferent operator outputs are very great. In fact many of the arguments against 
d.rect.onal operators have been pragmatic, that it is difficult to combine oriented 
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operators and produce a coherent output. The problems with combining operator 
outputs of different widths are even worse, because maxima in the outputs of two 
operators responding to a single edge may be displaced from each other. Chapter 
3 described feature synthesis as a method for combining several feature detector 
outputs. This method is used in the implementation of the edge detector, and we 
now explore how well it performs on some test images. 

6.2.1. Integration of Different Mask Widths 

The reader can readily gain an appreciation of the variety of detail that occurs 
at different scales in an image by reference to figure (6.4). This figure shows the 
edges marked by two operators on an image of a perforated cleaning cloth. The 
mask widths are a = 1.0 and a = 5.0 respectively. The edges at the two scales are 
virtually independent. In contrast, figure (6.7) shows the edges marked by operators 
with a = 1.0 and a = 2.0 on an image of some mechanical parts. In this image, 
almost all of the detail is picked up by the smaller operator, and it only fails on 
some of the shadow boundaries. These two figures capture the essence of the feature 
integration problem. Ideally every feature in the image should be marked, but only 
once. 

Our basic width selection criterion, that is using the smallest operator that has 
sufficient signal to noise ratio, should do the right thing on the parts image. The 
edges at the smaller scale will first be marked, then the larger operator output will 
be synthesized from them, and finally any edges in the large operator output that 
are not consistent with the synthesized output will be added. The result is shown 
in the figure (6.8a). The only difference between this and the figure (6.7a) is the 
addition of some shadow edges, and the extension of some shading edges. Similarly 
the figure (6.5a) shows the combined output from the edges in figure (6.3). Since 
the long shading lines in the image are not seen by the smaller operators, the large 
operator features that correspond to them are not synthesized. When they appear 
in the actual large operator output they are marked in the detector output. 

There is some freedom with the feature synthesis approach as to the "inhibition" 
effect of the synthesized operator outputs. Recall that the actual operator output 
had to be significantly greater than the synthesized output for an edge to be 
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Figure 6.3. (a) Cleaning cloth image 
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Figure 6.4. (a) Edges from cleaning cloth image with operator width a = 1.0 
(b) edges from operator with a — 5.0 
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Figure 6.6. (a) Parts image 
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Figure 6.7a. Edges from parts image with operator width a = 1., 
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Figure 6.7b. Edges from parts image with operator width a = 2.0 
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Figure 6.8a. Combined edges for parts image using feature synthesis 
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Figure 6.8b. Superposition of the edges for parts image 
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marked. In the present implementation the vector difference between the actual 
and synthesized gradient is first taken and if the magmtude of this difference is 
greater than the magnitude of the synthesized gradient, a new edge is marked. By 
mtroducing a sca.e factor into the comparison, the likelihood of a new edge being 
marked can be varied. This factor must be determined empirical*. The above two 
■mages p.ace conflicting requirements on the facto, If it is too la rge, the fuzzy 
edges in the cloth texture are missed. If it is small, there is duplication of edge 
pomts m the parts image, and consequent smearing of the edge contours. A single 
va.ue was found which gave the results shown in the two figures. This value ha, 
g.ven good results on all the images tried, and does not require "tuning" for a 
particular image. For comparison, below each of the combined edge maps in figures 
(6.5) and (6.8) „ the superposition of the edges from which the maps were formed. 
Two final notes. In order for feature synthesis to be effective on the cloth 
texture image, the edges at the smafler sca.e should be produced by an operator 
wh.ch „ ^sensitive to slow gradients as described in section (3.5). Otherwise the 
synthes.zed .arge operator output will contain a slowly varying component that wifl 
prevent new edges from being marked, even though the sma.1 operator did not mark 
he slow edges. This component is due to the s.ow gradient "leaking through" the 
large number of closely spaced edges from the smafler operator. In genera, feature 
synthes.s is most effective when the two features are independent. 

Also the combined output exhibits streaking of the long contours. This is 
because a sing.e scale factor is presently being used for comparison of real and 
synthesized outputs. This factor may be viewed as a kind of threshold, and therefore 
proved performance should be possib.e by using two values, i.e. by thresho.ding 
w.th hysteresis. This has yet to be tried at the time of writing. 
6.2.2. Integration of Directional Masks 

Recafl from section (3.2) that highly elongated directional operators are 
pre erred whenever they have sufficient goodness of fit to the image. The goodness 
of fit measure is simply the standard deviation of the gradient values at several 
pomts along the le ngth of the directional mask. A directional operator can only 
mark an edge if the sum of these values exceeds some fixed mu.tiple of their 
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standard deviation. This prevents a directional operator from responding to curved 
edges or from extending edge contours beyond corners. 

It also makes operator integration much easier. For one thing there will seldom 
be more than two directional operators responding to an edge of a particular 
orientation. Also the edges points produced by directional operators will not be 
displaced significantly from the edges produced by less directional operators if both 
have the same width. This is because the only way an edge point can be displaced 
is if the edge is significantly curved, but this will immediately prevent a directional 
operator from responding to it. Feature synthesis is not necessary for directional 
operator integration, and a simple non-maximum scheme suffices. 

The non-maximum suppression scheme was described in section (5.3). The 
only peculiarity of non-maximum suppression for directional operators is that 
the direction of non- maximum suppression is fixed a priori for each mask. It is 
normal to the long axis of the mask. Once all edges have been marked by the 
directional operators, the simple gradient magnitude is computed. A composite 
gradient is formed as the maximum of the simple gradient and the magnitude of 
the directional gradient. This composite gradient is then non-maximum suppressed 
in the gradient direction. The effect of forming a composite gradient is to prevent 
simple (non-directional) edges from being marked adjacent to directional ones. An 
example of the performance of this scheme is given in figure (6.10). The first frame 
of (6.10) shows the edges marked using simple gradient non-maximum suppression. 
The second frame shows the addition of directional operators at the same scale. 
Several additional elongated edges are visible in the second frame. There is no 
evidence of straightening of curved edges or extension of edges beyond corners. 

The edge detector has been used quite extensively by the author in practical 
systems. The first of these is a contour tracker which locates an edge contour in 
an image from a television camera, and plans a trajectory for a robot manipulator 
which follows the contour. The second system forms polygonal approximations to 
the bounding contours of objects in an overhead image of a robot's workspace. An 
example of this system is given in figure (6.11). These are then used to plan paths 
through the workspace which avoid the obstacles. It has also been used by others 
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Figure 6.9. Dalek image, approximately 700 by 500 pixeh 
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Figure 6.10a. Edges from Dalek image at a = 2.0 
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in the context of shape description (Brady and Asada 1983). It has been used as 
a front end for an implementation of the Marr Poggio stereo algorithm (Grimson 
1981) but a quantitative comparison with Laplacian of Gaussian zero-crossings in 
this application has not yet been done. We close this section with examples of the 
edge detector output on (as promised) a variety of images. These appear in figures 
(6.12) through (6.15). The images are all roughly 700 by 500 pixels and the time 
to process each one was about 10 minutes on an MIT Lisp machine with no special 
hardware. 
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Figure 6.12a. Quasimodo image 
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Figure 6.13a. Kent image 
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Figure 6.13b. Edges from Kent image using operator width a = 1. 
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Figure 6.14a. Westminster image 
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Figure 6.14b. Edges from Westminster 
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image, operator width o = 1.0 
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Figure 6.15a. Marine image 
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Figure 6.15b. Edges from operator width a = 1.0 



114 



6.3. The Line Finder 

An optimal operator for ridges was derived in chapter 4. It was pointed out 
that the extension of this operator to two dimensions was more difficult than the 
edge detector because of the lack of a natural property (such as the gradient) which 
could be used to determine the ridge orientation. The ridge detector must rely on 
a much more noisy quantity (which is approximately the direction of maximum 
curvature) to perform non- maximum suppression. Printed text is a difficult test 
case for a ridge detector because of the variation in orientation and width, and the 
presence of junctions. The ridge detector output on some printed text is given in 
figure (6.16). It uses a second derivative of a Gaussian to approximate the optimal 
operator derived in section (4.1). For reference, the edge detector output on the 
same image is given in figure (6.17). The ridge detector output is subjectively more 
legible, but in many places sections of contour are missing. 

In principle it should be possible to incorporate the ridge detector output in 
the step edge detector using feature synthesis (or some other feature integration 
approach). This has not been done to the present time and remains a challenge to 
low level vision schemes. 
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s position and orientation, bot siocc the basis fooclioo. wc uiua)l r ool 
be properties apply only to o projection of the scluaJ inw. t c surface 



Figure 6.16. (a) Text image, (b) negative ridge detector output using <r = 7 
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Figure 6.17. Output of the edge detector on the text image, a = 1.0 
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6.4. Psychopbysics 

We have made a case for a particular set of criteria on effective edge detector 
an we have claimed that ^ ^ ^ ^^ ^ & ^ ^ ^^ 

In theory any edge detector which is used in these apphcations should use similar 
cnten., and an algorithm which is consistent with these criteria. The human visual 
system provides structural information about the visual field to later processes, and 
human beings are adept at stereoscopic depth perception. It is reasonable to argue 
that .t should perform edge detection at an early stage. 

It is also reasonable (from the arguments given in chapter 3) to suggest that 
* should use a variety of operator widths and orientations. It should therefore give 
pre erence to small operators whenever they have sufficient signal to noise ratio. 
To test th.s hypothesis we need somehow to produce an image which has different 
detad at two scales, and then add noise to see if the percept changes. Such an 
-age ,s the coarsely sampled picture of Abraham Lincoln used by Harmon and 
Jules, (1 73) . The effect of. the coarse samp.ing is to introduce irre.evant detail at 
small scales. The detail makes the image difficult to perceive unless blurred. 

The same effect should be observed if we add noise to the image, because 
he SI gna. to noise ratio of the small operators will become intolerable before the 
larger os. ^ the _,,„ ^ ^ ^ ^ ^ ^ ^ 

wmie the larger channels will still contain coherent information. A coarsely sampled 
■mage of a well-known stereo-type (not Lincoln) is shown in figure ( 6 18 ) The 
successive frames have not been blurred but contain increasing amounts of additive 

^ J 8 " n ° iSe " ^ latef frameS - 6aSier to PerC6ive - a h — f-e. We 
therefore have the remarkable situation that adding incoherent noise to such an 
image makes it more perceptible. 

A second result of the analysis in chapter 3 is that highly directional operators 
have etter signal to noise ratio than less directional operators. The highly directional 
operators will not be as sensitive to rapid changes in the orientation of an edge 
conto Ur , Md will tend to make a ^ ^^ ^ ^ ^^ 

( .19) contams an series of paral.el fines which are locally curved but globally 
strait along their length. The fines are closely spaced so that larger channel 



118 




Figure 6.18. Image of a human face with varying amounts of additive white 
Gaussian noise. 



widths (which would also tend to give a subjective straightening of the contour) 
will have poor signal to noise ratio. When noise is added to this image, there is an 
apparent straightening of the lines. 

This would indicate a scheme which, in contrast with the present detector, 
gives preference to less directional operators when they have sufficient signal 
to noise ratio. However the apparent inconsistency can be resolved if we use a 
more sophisticated applicability test for the directional operators. In the present 
algorithm, directional operators would not be applicable in either of the frames in 
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^eI^^ d ^T G tlZn°tT ly Parallel liDeS With ****** variation in 



to (6.1.) because of *e^o7^~^~^^^^ 

The simple standard deviation apnlicabilitv m „ 

not straight or h t , n W'^hty measure « poor if the edge eontour is 

stra, g ht or breaks at a corner. It is also poor if the image is noisy but in th' 
case a directional operator is no less applicab.e. ff image noise is ta J i n 1 
>n e applicable metric, we would expect the addition of noise to el" 
apphcabdity of directional operators, consistent with (6.19). 
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7. Related Work 

Now that we have examined in some detail the design of an edge detector 
using variational techniques, we can contrast it with other schemes with respect 
to both its design goals and the methods used to attain those goals. In order to 
consider any appreciable fraction of the great variety of edge detection schemes that 
have been proposed it is necessary to form a categorization of these schemes. Most 
schemes in fact do not lie wholly within one of these categories, but retain aspects 
of several. We will examine several detectors based on their apparent commitment 
to the following goals 

(i) A decision as to the presence of an edge and an estimate of its location from a 
best-fitting surface that approximates the real image surface. 

(ii) To optimally estimate some derivative, usually first or second, at each point in 
the image and mark edges at local features in these derivative outputs, e.g. 
zero-crossings in second derivative or maxima in first derivative. 

(iii) Frequency domain techniques, which attempt to enhance edges by filtering. 
Here the filters are designed using frequency domain techniques to optimally 
discriminate step edges from the background, by assuming some frequency 
distribution for the background. 

All comparisons will be theoretical and generally quantitative, since for early 
vision level of performance of an algorithm can be crucial. For a more extensive 
survey the reader is referred to Davis (1975). For experimental comparisons the 
reader should see Fram and Deutsch (1975), which compares several operators 
applied to step edges in noise, and includes a comparison with human performance 
on the same synthetic images. Abdou and Pratt (1979) compare local differential 
and template matching operators based on a figure of merit which is very similar 
to the performance criterion used in the present detector. This figure of merit was 
introduced by Pratt (1978, p495) and is given by 



F= - -■ i 



.5 1 + * 



max(//, I a) <ri 1 + ad?(i) 
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who , ^ j ih „ UInber Qf ,, eal and ^^ edge pomtS) js ui£ 

I t ;: true edgei Md a is a sca,ing «"*«* ""* ^ rmin es 

the trade-off between detection and localization. 



7.1. Surface Fitting 



There are a number of edge detectors that are based on some kind of image 
surface modelling. These methods usually involve an initial parametria of the 

rr c ; ': terms of some set ° f basis funcu ° ns *»*-* * *• -*»*«« 

of the amphtude and position of the best-fitting step edge fro m the parameters. 

One of the earnest examples of this method was the Prewitt operator (1970), which 

used a quadratic set of basis functions. Another ear. y examp.e is the detector of 

Huec e, ( 1971 ). Hueckel's method uses basis functions with circular support, and 

tnes to fit a single step edge to each circu.ar are, The basis functions are chosen 

» as o g,ve an approximate Fourier Transform of the circuiar region. However 
as w lth t surface fitUng schemeS) th£ basis ^ . ^ comp]ete ^ ^ . 

bas, functions over a support of 52 pixe.s) and an edge is actually fitted to a 
smoo^ed version of the original surface. An argument is presented to the effect 
hat the cho.ce of a low-frequency subset of the complete space of basis functions 
does not prejudice the ability of the operator to detect and .ocalize edges, but proof 
of th,s ,s not given. Instead it is argued that the high-frequency components should 
be Ig nored because they will contain much of the image noise. 

Another example of this approach is the work of Hara.ick. In Hara.ick's 1980 
•rt.de, he proposes a fitting of the image by small pianar surfaces or "facets- 
Edges « mked at points which beJong to ^ such ^ when ^ param ^ ^ 

e tw su ^ inconsJstent ^ ^ ^ consistency ^ bas£d Qn ^^ 

f fit o each surface within its neighbourhood and uses a squared statistic. Again 
the m,,a. surface fitting invq.ves a set of parameters which do not completely 
represent the image surface. In this case there are 3 parameters over a square 
support of somewhere between 4 and 25 pixeis. The three parameters are in fact 
estates of the x and y slope and the average value over the support. 

In subsequent work on edge detection using a more genera, surface fitting 
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technique, Haralick (1982) has used higher order polynomial basis functions with 
larger operator supports. The later scheme locates edges at zero-crossings in the 
second derivative of the modelled image surface in the image gradient direction. It 
uses cubic polynomials (in x and y) as the basis functions over a square support of 
(typically) 121 pixels. Interestingly, this choice of basis functions yields an operator 
which can be shown to be quite similar to the operator described in this report. 
However, if higher order or lower order polynomials are used, performance will be 
worse. We now demonstrate this similarity. 

The polynomials used are the discrete Chebychev polynomials, denoted P%(r) f 
and for simplicity we will consider a one-dimensional problem. The objective of 
surface fitting is to find the coefficients aj such that the sum 

Q(r) = £ ai Pi{r) (7.8) 

«=0 

gives the best square-error fit to the actual sampled image surface I(r). That is we 
seek to minimize the value of 



e 2 



= E (*r) - E *Pi(r)) 2 (7.9) 

r=-fl V *=0 ' 



We do this by setting to zero the partial derivatives of e 2 with respect to each 



of the ay. 



R n 

E Ur) - E o^MV/M = o 



This leads to the solution of a system of linear equations in the ay, but in the 
case where the polynomials are orthogonal the system is diagonal and the solution 
is simply 



1 R 



ay = - £ Py(r)/(r) (7.10) 

P] r=—R 



where 
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H = E P)(r) 



r=-R 



The method then estimates the first and second partial derivatives of this 
modelled surface. For example the first derivative is 

Tr Q(ro) = |>f*(ro) (7 . n) 

Substituting equation (7.10) into (7.11) we obtain 

The important thing to note about this equation is that it is linear in the 
sampled image intensity, and that therefore the operation of surface fitting followed 
by denvative estimation can be represented as a single convolution. We find the 
eomvalent filter for this convolution from (7.12). Sinc e this expression has the form 
of a discrete convolution over r, by removing the summation over r and the input 
term I(r), we obtain the impulse response of the equivalent filter 

The derivation of the expression for the second directional derivative is simHar 
The n ext step in the surface fiu . ng approach k to ^ ed ^ ^ ^ • 

» the second directional derivative. These will correspond to maxima in the first 
denvat-ve given above. This puts us in the position of being able to directly compare 
the surface fitting approach to the variational approach described in this report 
Both methods are effectively marking edges at the maxima in the output of the 
convolution of the image with some linear operator. We can use the optimality 
cntena that we defined for step edges to (analytically) eva.uate the performance of 
the surface fitting operator. 
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The form of the equivalent filter depends directly on the choice of basis 
functions Pj. The equivalent filter for the Chebychev basis polynomials up to 
degree three is shown in figure (7.1). It turns out that the choice of cubic basis 
polynomials gives the best approximation to the optimal operator derived in this 
report. The perceptive reader may note that the surface fitting and gradient 
estimation procedure is equivalent to convolving with a function that is the best 
approximation to a derivative function (within the constraints imposed by the basis 
functions) i.e. the filter has an impulse response that is the first derivative of a 
delta function. Reference to (7.13) shows that as the order of the basis functions 
becomes large, the filter f(r) tends to a simple local gradient estimator, similar to 
the 3-pixel Prewitt operator. The equivalent filter for n = 7 is shown in the second 
frame of figure (7.1). 

Thus the 3-order Chebychev polynomials give the best performance with this 
approach, while higher order polynomials lead to operators that are approximations 
to local derivative operators. This answers one of the questions raised by Haralick 
in his article as to what order of polynomial functions is best. The answer to the 
other question raised, viz. what form of basis functions to use, can also be answered, 
since his criteria of performance are essentially the same as ours. Note that these 
criteria were used to experimentally evaluate the performance of the surface fitting 
operator, but did not appear explicity in the design. The optimal surface fitting 
operator for step edges would use a single basis function which is the first derivative 
of Gaussian derived here. Fitting and gradient estimation using this single basis 
function is equivalent to convolution with the same function. 

So we see that the ultimate performance of the surface fitting approach is 
determined entirely by the choice of basis functions. However, no analysis was done 
in Haralick (1980) or in Huedkel (1970) as to the optimality of their respective 
sets of functions. Other advocates of the surface fitting approach have made more 
detailed analysis of the basis functions. For example Hummel (1978) suggested the 
use of Karhunen-Loeve principal components for the basis functions. 
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to JST, ft EqUiValeDt fi,ters f ° r -»ic basis functions (a) and for basis functions 



7.2. Derivative Estimation 

Since an ideal step edge is a rapid transition from one intensity value to another 
■t seems that a reasonab.e way to detect edges is to estimate some derivative of the' 
■nrage lntensi ^ fa ^^^ ^^ _ ^ ^^ 

(«W), Prew.tt (1.70), Rosenfeld and Thurston ( 19 71) ( Mac.eod ( 19 70) and a 
vanety o others. There has also been some interest in operators that itimat 

he snd denvative of the image intensity. The operator of Modestino and PVies 
(1977) est.mates a Gaussian smoothed Lap.acian using ? computationally efficient 
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recursive filtering algorithm. Herskovits and Binford (1970) used a form of lateral 
inhibition to reduce the sensitivity of their operator to slow gradients, and then 
followed with first and second derivative estimation to locate the edges. Recently 
there has been interest in operators which locate zero-crossings in the second 
directional derivative in the gradient direction, viz. Havens and Strikwerda (1983), 
Torre and Poggio (1983), Yuille (1983) and Haralick (1982). 

We should note that the operator derived in chapters 2 and 3 has very strong 
similarities to two of the above operators. In particular, we have been using the first 
derivative of a Gaussian to approximate the optimal operator derived in chapter 2. 
The simplest two-dimensional extension of this used a Gaussian projection function, 
which results in a two-dimensional operator which is very similar to Macleod's. 
It also bears a strong resemblance to the Marr-Hildreth operator, at least in one 
dimension, as we shall see in a moment. 

It has been argued in this report that the optimal edge detection function 
should be asymmetric (see section 2.1), and it may therefore be viewed as a first 
derivative operator. However, it was not designed to optimally estimate gradient, 
but to detect step edges. This distinction is subtle, but it should be stressed at this 
point. The argument for derivative estimation is that the image gradient attains a 
maximum at the centre of a step edge, and that therefore edges may be detected 
by finding maxima in gradient. However, it does not follow that gradient is the 
best measure to use to detect and localize edges. Marr and Hildreth (1980) suggest 
the use of the slope of the output of the Laplacian of Gaussian operator. Again the 
observation is that this quantity is proportional to the edge strength. 

We should really be trying to estimate the "edgeness" of a potential edge. 
However such a measure can only be defined implicitly by a variational equation, 
such as equation (2.12). The tendency has been to use a posteriori measures, such 
as gradient or zero- crossings of some derivative, as evidence of edges. The fact 
that high gradients occur near edges do not mean that all points of high gradient 
correspond to edges. 

Since in one dimension the zero-crossing of second derivative operator (Marr 
and Hildreth) is essentially the same (ignoring the thresholding question for the 
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moment) as a n,a ximum f flrst derivatjve ^ 

re c lutlon we wou]d expect sjmi]ar perfomance from Pjr — 

In two d lme ns,ons the extensions are different and n,rf • 

different. We now show that th t , P^orma„ce „ noticeably 

now show that the two d.mensional Laplacian of Gaussian gives 
Poorer location than the direetiona. operator described in chapter 3 Let T 
two-dnnens.ona, Lapiacian of Gaussian be described b y the elation 



"* 2 + y 2 




(7.14) 

Now by the method of chapter 3, the standard deviation of the position of 
th ero.cross.ng . the q uotient of the s,ope of the zero-crossing at the edge ce t 
«- the root .ean Sq uared noise in the opera.r output. Let the input be ^ 
of a-nphtude A in the y direction, i.e. the elation of the input S (x , y) is ' 

S(x f y) = Au-^x) 
Then the slope of the zero-crossing is 



dxo 



/ x o /-f-«> 
-ooT-oo AL ( x 'V)dydx 

and the root mean squared output noise is 



(7.15) 



n m \l / 



L 2 (x, y) dy dx 



1* 



(7.16) 



localization A L of the Laplacian of Gaussian is just 

A L = 1 
the ed^ 7 T thSS ^^ ^ '° Ca,iZati0n ° f a direCti0naI -*— •%«- with 
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Again we find the quotient of (7.15) and (7.16) and substitute from (7.17) and we 
obtain for the localization of this operator 



A D = J? ss 1.63 
V ^ 

So on average we would expect the positional error of the Laplacian of Gaussian 
to be about 60% greater than that of a directional operator of the same a. 

It has been suggested that the strength of a zero-crossing may be estimated 
from the slope of the zero-crossing (normal to the edge direction). We should also 
compare the signal to noise ratio of this measure with signal to noise ratio of the 
first directional derivative. The slope of the zero-crossing of a Laplacian of Gaussian 
is again given by equation {7.15), while the noise in this value can be found from 
the integral 



raoo 



L L {**•*) dydx 



(7.18) 



This gives the Laplacian of Gaussian a signal to noise ratio Sx,, formed from 
the quotient of equations (7.15) and (7.18), of 



Si 



= # 



Finally we compare this value with the signal to noise ratio of the two- 
dimensional directional derivative operator 

/ x 2 -\-y 2 \ 
D(x,y) = xexp^ ^T~) 

which turns out to have a E value of 

129 



£d = 2a 



LapJI 7c ° WS ; hat th£ direCti ° nal ^ d ^- «*«*» »ette rs the 

2 rri 1 1 s,an , (wi pe estimation) by a fact ° r ° f Sli ^^ "— *« 

w th respect to s.gnal to noise ratio, and by a factor of about 1.6 with respect to 
location. In terms of the composite criterion SA we find that 



ZdA d = 4E L A L 



The above result shows that the slope of a zero-crossing is a very poor estimator 
for edge strength. While there are other possible choices for the edge 1X1 
esWor, we also find that the Lap.acian of Gauss.an still suffers „ ^ 
y caparison with a directional operator. The intuitive reason for this i 
two^mensmna, Lap.acian may be decomposed into the sum of second deriv i 
n (any) two orthogona. directions. If one of these is chosen to be norma, to t T dj 

it * 7 lear that this contributi ° n ,s ™* *- - * — -p. :: 
ir ;:: r: po r which win be pm * the - *— . ~ 

nothmg to locahzatmn but will increase the amount of noise. 
7.3. Frequency Domain Methods 

In this (rather small) category, we find one particular example of an approach 
wh,c used criteria very similar to ours, using a fluency domain derivatTw 

r d ;r: 7T Mon to iead to - — - — - - , :2 

d,d not, but th.s was due to a rather unfortunate restriction p.aced on the 
« so]ution . men the resWction ;s removedi ^ o J d J h 

wh,ch approbates a first derivative of a Gaussian. 

The method is that of Shanmugam et a. ( 197 9), who proposed the use of a 
t^mensmna, linear operator that approximates the Laplacian of a Gall 
The, r entena of optima.ity were that the function maximize the proportion of tot , 
-P" energy confined to a fixed interva, when it is convolved with a step ed 
Also the functmn must be strictly band-limited. These two criteria approxL.ly 
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capture those of the present design. Maximizing the proportion of total output 
energy in the interval will limit the range over which the maximum in the response 
to a step edge can occur. Band-limiting greatly improves output signal to noise 
ratio, since the spectrum of Gaussian noise is flat while the spectrum of a step edge 
varies as the inverse of frequency, i.e. most of the energy in the edge is concentrated 
at low frequencies. 

Unfortunately, there are two steps in the method of Shanmugam et al which 
the present author finds hard to justify. The first is that they made no attempt 
to mark edge points, but instead thresholded filtered values were output. In fact 
their filter gives two peaks in its response to an ideal step, but these are on either 
side of the centre of the step, and the response at the centre is actually zero. This 
was rectified to some extent in the work of Marr and Hildreth (1980) who used the 
zero-crossings of the same filter, since these features occur at the centre of step 
edges. 

A second problem is that they assumed a priori that the operator they were 
looking for would be trivially extensible to two dimensions by rotating it about 
an axis of symmetry. This immediately restricted them to symmetric operators, 
even in one dimension where the design was done. As we have seen in chapter 3, 
this restriction is unnecessary, and actually degrades performance. In fact if the 
restriction is removed, the same analysis leads to an operator that approximates 
the first derivative of a Gaussian, as used in the current design. We repeat their 
design without the assumption of symmetry now. 

Once again we perform an optimization to find the function that extremizes 
one criterion while another is kept constant. In this case the bandwidth of the 
response fi will be fixed while the fraction of total output energy in an interval 
[—7/2, -+-//2] is maximized, i.e. if the output response is g(x) and the fraction of 
the energy in the interval is a,* we maximize 

.+//2 



_ lift |g(s)| 2 dx 
f±Z\g(x)\*dz 



* = h!,.;:,. w 



We can make use of the fact that there exists a set of functions ipi(x) the 
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interval |-//2, +// 2 ] and orthonormal over (-oo, +00). i.e. 



U, t = 7 



•'.J = 0,1,2,... ( 7 .2) 



7 lX <( t=y M - 0,1,2,... (7.3) 

withXi < 1 for all i, and X > Xl > X 2 > ... 

The prolate spheroida. wave functions are complete in the space of band-hnnted 
functus, and hence the output fro m an y handled n.ter can he represent 

«(') = r o-^c, x) ( ., 

n=0 l'- 4 J 

Where the constant c is a function of the handwidth and the si 2 e of the interval 



c = 

2 



«tr:rr " ' w •■ -— - *■» - - — - M 



and (7.3), the value of a becomes 



a = 2SLol«n( 2 X n 

£~=o M 2 (7-5) 



The X; are all positive and X > X, -> \„ \ 

ao ^ A! > a 2 > . . ., so a is bounded by 



< a < ^E^oKf __ 

££=ok| 2 ~ X ° < * 



upper bound is attained when o B = 0forr,Nn *u 

u lor n > 0, so the optimal output is 



9(x) = a ^o(c,z) , 
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Since this is the desired step response of the filter, we can obtain the impulse 
response f(x) by differentiation. 

f(x) = a Q — t/>o(c,x) (7.7) 

An approximation to the functions ipn{c, x) due to Slepian (1965) can be used to 
find a closed form expression for f(x). 

where H n (x) is a Hermite polynomial of degree n. This approximation is useful for 
x < c~ X I A and n < c. So V>o(c, z) may be approximated by a Gaussian for small 
x, and the optimal spatial function f(x) will be the first derivative of a Gaussian 
as before 



f(x) = (/cx)exp(-|^-J 



In their original article, when Shanmugam et al (1979) assumed that the 
function f(x) should be symmetric, they were restricted to the odd prolate spheroidal 
functions ignoring ipo(c, x) which in fact gives the best performance. The X;(c) may 
be used as performance indices since they measure the fraction of the total energy in 
the specified region for the corresponding V't- The values of Xo may be significantly 
higher than those of Xi for small values of c. The small values of c imply that the 
product of spatial and frequency extent are minimal. For example at c = 0.5 the 
value of Xo is about 0.3 while the value of Xi is 0.0086 (see Slepian 1960). 

The intent of this chapter has been to put the present edge detector in 
context with several other well-known schemes. We have seen that there are strong 
similarities in analytic form with several of these schemes, in particular with the 
detectors of Marr and Hildreth, Macleod, Haralick and Shanmugam et al. There 
are also important differences, for example we have not yet considered the use of 
multiple operators or of highly directional masks. Rosenfeld (1971) used several 
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-s dlfference of boxeSj and formed a composjte ^^ 

to he ^ operator whjch djd ^ haw a sjgnmc jower P J*~ 

next small ato , Ma „ (1Q76) ^ both ^ P J- 

and o r multlple scales , but reneged Qn ^ first ^ - 

J0 S% b of the apparent difficu)ty jn jmpiementati ^ 

des,n produc es an edg e map which is very Slmilar fa tQ ~ 

or ^ Primal sketch , whlch was motivated purdy by 2; 
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8. Conclusions and Suggestions for Further Work 

We began this report with a precise definition of a set of goals for edge detection 
and proceeded to derive an operator which best achieved these goals. The goals 
were carefully chosen with minimal assumptions about the form of an optimal edge 
operator. The constraints imposed were that we would mark edges at the maxima 
in the output of a linear shift- invariant operator. By expressing the criteria as 
functionals on the impulse response of the edge detection operator, we were able 
to optimize over a large solution space, without imposing constraint on the form of 
the solution. 

Using this technique with an initial model of a step edge in white Gaussian 
noise, in chapter 2 we found that there was a fundamental limit to the simultaneous 
detection and localization of step edges. This led to a natural uncertainty relationship 
between the localizing and detecting abilities of the edge detector. This relationship 
in turn led to a powerful constraint on the solution, i.e. that there is a class of 
optimal operators all of which can be obtained from a single operator by spatial 
scaling. By varying the width of this operator it is possible to vary the trade-off 
in signal to noise ratio versus localization, at the same time ensuring that for any 
value of one of the quantities, the other will be maximized. 

It was then found that the goals as originally specified were not well defined, 
or rather that the analytic criteria did not articulate all that we expected of the 
edge detector. By adding an explicit criterion related to multiple responses, we were 
able to obtain an operator that met all of our intuitive design goals. The multiple 
response constraint did add considerable complexity to the form of the solution and 
in fact it was not possible to realize a solution in fully closed form. However, the 
analysis was able to constrain the solution to a finite (low) dimensional parameter 
space over which a numerical solution could be obtained. The impulse response of 
the operator is a sum of damped exponential cosines, and it can be. approximated 
by the first derivative of a Gaussian. 

We then extended the above operator to two dimensions and in doing so we 
followed the framework that was established for the one-dimensional formulation 
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exten S ,on to two d.mens.ons. It was found that multiple operator 
w.dths were „ece SSa r y to dearth different signal to noise rat,o S in an^T 
ecause of the trade-o ff de S er ibed above. We also found that direet.o,!, pe Z 
W clear advantages over non-directional operator, and that the more dir J^ 

an operator 1S the better its potentia, performance. The tradiUona. probleL 
a 3oc lated Wlth high]y directjonal ^ ^^ ^ b^m 

f fi measure to decde whether each directional operator could he used. We 2 

h::zt probiem of combming an these — — * - 

oherent whole. Once agarn we used the same set of design g oa ls to guide the 

riirrt 6 the appropriate *~ - *— — - « 

as a means of combmmg the outputs of operators whose responses to the same 
feature are not necessarily spatially coincident. 

The first selection heuristic was to give preference to operators of minimum 

w ,d h proviaed they had sufficient signaI to noise -" 

jolutmn^ Nation for a given g ,ohal sig nal to noise ratio. CmZl 

; ii rr widths was difficu,t because ° f the iack ° f — — 

01 different operator outputs. It was necessarv tn „«„ f <■ 

was necessary to use feature synthesis fexamDles 
were glV en m chapter 6) for the operator integration. 

The second heuristic was to favour high.y directional operators whenever they 
have suffice q ua,ity of fit to the image. The integration of different operate 

::: : :r as re,ativeiy simp,e Md required ^ * *** - — 

rm of non-ma Xl mu m suppression. Examples of this technique were given in 
chapter S It is unclear which goodness of fit measure should he used, and II 
an algonthm was presented which performs adequately, there was no demonstrat on 
o . opt.maht, In fact there is some evidence (section , 4) that the human ^ 
Mem, winch m other respects demonstrates simi.arities t. the detector described 
here, uses a different (or more complicated) decision procedure. 

To make the convolution of images with the optima, operator more efficient a 

rs derive of a Caussian approximation was used. This allowed us to us ^ 

of the efficent algorithms presented in section 3. 2 to spe.ed things up. It was found 
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that there exists an approximately "linear" time (in the width of a square mask) 
algorithm for doing convolutions with arbitrary masks, and so efficient convolution 
is possible even without the approximation. Experiments are now being performed 
to determine the practicality of implementing this scheme in hardware. 

In chapter 4 we were able to generalize the method used for step edges in 
white Gaussian noise to arbitrary features and to non-white but stationary random 
noise. In addition to a general form for the criteria, a fast numerical method for 
the solution was described. This technique was then used to find optimal operators 
for ridge and roof features. The ridge detector was extended to two dimensions, 
and this was found to be much more difficult than for the edge operator because of 
the lack of reliable information about the ridge direction. An example of the ridge 
detector output appears in section (6.3), and was compared to the edge detector on 
the same image. 

Finally in chapters 6 and 7 some comparisons were made between the edge 
detector derived here, the Marr-Hildreth (1980) operator and the difference of boxes 
operator. There were both analytic and experimental comparisons. It was also 
compared to two other edge operators, those of Haralick (1982) and of Shanmugam 
et al. (1979), and was found to be similar to all of these in one dimension. Chapter 6 
also included several examples of the edge detector output on some natural scenes. 
Finally we saw in section 6.5 some perceptual effects which seem to indicate that 
the human visual system uses a similar feature combination scheme. 

There are several directions in which the work in this report could be continued. 
The most obvious is probably the area of integration of feature descriptions. The 
algorithm as described in this report includes a feature synthesis method to combine 
output of several operators of differing width. It could potentially be used to 
combine the outputs of detectors for different features, such as the ridge detector 
described in chapter 4. It may be possible to form criteria on the performance of a 
feature integration scheme. Two possible criteria are 

(i) The integration scheme should not miss features. If a single feature is marked 
by one of the detectors, it should be marked in the integrator output. Also, 
if two feature detectors are responding at the same point in a way that is 
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n °; "r^ Wfth therC b£ing "*> a *"* '«*« in the image, then the 
integrator should mark both features on the output. 

00 The ; tegrati scheme shouW ^^^ ^ ^^^ ^^ [njnimaj 

An obv.ous scheme whjch performs wen accordjng ^ y- 

• My nrnrk everything S ee„ by any feat u re detector. I„ this C J in J f ^ 
oHe t f matj0n „ occurring) e g ^ ig _ £« 

two parallel closely spaced edires if it h.. . j . 

is ™» f * • already been identified f^ what it 

-■ The eature m te gra t or may be viewed as a filter on the outputs of a„ th 

feature detectors which removes redundant inflation. 

Both of the above criteria require a scheme which is no„-loc a l that i, •♦ * 

:r :r nt : outputs ° f ~* — — - ££- -:: 

po. t ^ature synthes.s is a „ o „, c a , scheme as i s the surface fiuing ^ 
used ,n the To P o gra phic Prima, Sketch of Ha r a ,ick (1988) . It wo J ^ 
com pa rm g the two schemes .cording to the above criteria The dimcu^ 
usm g an optimization scheme to find an optimal feature integrator is thaT K 

more complicated image mode, would be necessary At the & 

+„ • i j ,, , necessary. At the very least t would ne^d 

o .nclude all those features for which the individua, detectors were design l d 
presumably all possible combinations of those features. 

Another possibie extension of the edge detector wou!d be to 3 or more 

~n, ^ haVe *"* - " *«*« 3 that there is a simple extension 
the o ptlmal operator t0 n dimensions Thjs operator ]ocates » ^2 

n-d.mensm al tension of edge contours) ^ ^^ -^ 

Usmg hl gh,y directional operators is more difficuit in this domain bec.use oTI 
Wge number of directions needed to uniformly cover an n-sphere. Non" 
suppress^ ,s a.so more compiicated for the sa me re.son. 

In particular it is possible to consider the detection of moving edges as a th™ 
agonal edge detection problem, where the third dimension is Tt 1 ^ 
Edges ,n h,s three dimensional space correspond to edges in the two-dim^' 
-ge or to points of rapid.y changmg irradiance. The direction of the ed * 
the three-space can be used to determine the velocity of the two-dimensional eg 
There . a constraint that the time-space edge fi.ter must be causa,, that is it £ 
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depend only on past and present intensity values. Using a filter that is broad in the 
time domain will introduce a delay into the velocity estimate. Thus the design of 
the time domain filter is a separate optimization problem which requires additional 
constraints of causality and minimal time delay. 

One final generalization of the techniques described in this report would be 
to relax the restriction of linearity on the operator. Shift invariance is clearly a 
desirable property of an edge detection operator, but it is not clear that the optimal 
operator must be linear. In fact the composite operator derived in this report is 
non-linear because it involves a non-local predicate (from the feature synthesis 
scheme) applied to several operator outputs. Ideally this constraint should be either 
relaxed or it should be proven that linear operators can perform as well as non-linear 
operators. The restriction to linear operators here was necessary because of the 
sheer complexity of parametrizing a non-linear shift invariant operator in a form 
which would allow variational methods to be applied. It remains to be seen whether 
this restriction penalizes performance, and whether an unconstrained non-linear 
operator can do any better. 



139 



Appendix I. Definite Integrals used in the Derivations 

These integrals were referred to in chanter 9 k„* 

because „f *h • • ■ W6re n0t mcluded there 

Decause of their excessive ength Ther P arP q ;^ i 

gUh 1Jlere are 3 integrals required to evaluate (2.12) 
and an additional integral is neccesary for (2 24) Of th. a ■ ♦ , 
writtPn in fk ( j ' eSe 4 mte S ral s, 3 can be 

written in the same parametric form because thpv a ii • i ^ 

- , ' Decause the y a " involve the integral of the 

square of a function of the form 



*(*) = C ^ aisin ^ + c 2e -cosa;x + C3e--sina;x + c 4 e--co S ^ 
We now define 



h{c u c 2t c 3 , c 4 ) = £ g(x) dx and ^^ c2j C3> c4) = ^1 



^ g\x)dx 

And we find that all of the integrals in tn* ^,f 

ne integrals m the performance criteria can be written 
in terms of I x an d J 2 thus 



J^fWdx = /iKa 2 , a3 , a4 ) + c 
y+i 2 



«4) + 2c 2 



f +1 > 2 

;_, /' w dx = h(aai __ wa2> aa2 + wfli> __ Qa3 __ ^ _^ ^ ^ 

U f" to dx = 7 2 (/? ai - 7a2 , ^ + 7flij ^ + ^ ^ __ ^ 



where 
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P = a — ijj and 7 = 2aw 

The closed form for the definite integral I\ in terms of a, a;, and c\ through 
C4 and for a unit interval (W = 1) is 

A(ci,C2,C3,C4) = — 5— ■ — r(cie a (asina; — wcoscj)-fci;ci 
or -|-arV . 

-j-C2e a (u; sin a; -f a cos u) — olci 
-J-C3e —a (asina; -|- a; cos a;) — ucz 
-\-c4e~ a (w sin u — a cos u) + ac4 J 

Similarly the closed form for I2 over the unit interval is 



2, .3 



^2(ci,C2,C3,C4) = - — 3 — ^lc\e 2a (—au 2 sin 2w — a 2 uj cos 2lj -{-lj s -\- a 2 uj) — c\uj 

-\-c\z 2a (au 2 sin 2w -f a 2 oj cos 2a; -f- a; 3 -f- a 2 a>) 

— c^a; 3 + 2a 2 w) 

4-Cge~ 2a (— aa; 2 sin 2a; -f a 2 u cos 2a; — a; 3 — a 2 w) -f CgW 3 

-fc 2 e — 2a (aa; 2 sin 2a; — a 2 u> cos 2a; — a; 3 — a 2 a;) 

+c 2 1 (a; 3 + 2a 2 a;) 

4-2ciC2e 2a (a 2 a; sin 2a; — aa; 2 cos 2a;) + 2ciC2«a; 2 

+2cic 3 (a 3 + aa; 2 )(2a; — sin 2a;) 

— 4(ciC4 + C2C3)(a 3 + aa; 2 ) sin 2 u 

-f2c 2 c 4 (a 3 + aa; 2 )(2a; + sin 2a;) 

+2c3C4e — 2a ( — a 2 w sin 2a; — aa; 2 cos 2a;) -j- 2c 3 C4aa; 2 J 
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